All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"agk@redhat.com" <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write dev_pgmap_ops
Date: Wed, 6 Apr 2022 17:32:31 +0000	[thread overview]
Message-ID: <196d51a3-b3cc-02ae-0d7d-ee6fbb4d50e4@oracle.com> (raw)
In-Reply-To: <Yk0i/pODntZ7lbDo@infradead.org>

On 4/5/2022 10:19 PM, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 01:47:45PM -0600, Jane Chu wrote:
>> Introduce DAX_RECOVERY flag to dax_direct_access(). The flag is
>> not set by default in dax_direct_access() such that the helper
>> does not translate a pmem range to kernel virtual address if the
>> range contains uncorrectable errors.  When the flag is set,
>> the helper ignores the UEs and return kernel virtual adderss so
>> that the caller may get on with data recovery via write.
>>
>> Also introduce a new dev_pagemap_ops .recovery_write function.
>> The function is applicable to FSDAX device only. The device
>> page backend driver provides .recovery_write function if the
>> device has underlying mechanism to clear the uncorrectable
>> errors on the fly.
> 
> I know Dan suggested it, but I still think dev_pagemap_ops is the very
> wrong choice here.  It is about VM callbacks to ZONE_DEVICE owners
> independent of what pagemap type they are.  .recovery_write on the
> other hand is completely specific to the DAX write path and has no
> MM interactions at all.

Yes, I believe Dan was motivated by avoiding the dm dance as a result of
adding .recovery_write to dax_operations.

I understand your point about .recovery_write is device specific and
thus not something appropriate for device agnostic ops.

I can see 2 options so far -

1)  add .recovery_write to dax_operations and do the dm dance to hunt 
down to the base device that actually provides the recovery action

2)  an ugly but expedient approach based on the observation that 
dax_direct_access() has already gone through the dm dance and thus could 
scoop up the .recovery_write function pointer if DAX_RECOVERY flag is 
set.  Like bundle action-flag with action, and if should there need more
device specific actions, just add another action with associated flag.

I'm thinking about something like this

    long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
                           long nr_pages, struct daxdev_specific *action,
                           int flags, void **kaddr, pfn_t *pfn)

    where
    struct daxdev_specific {
	int flags;	/* DAX_RECOVERY, etc */
	size_t (*recovery_write) (pfn_t pfn, pgoff_t pgoff, void *addr,
				 size_t bytes, void *iter);
    }

    __pmem_direct_access() provides the .recovery_write function pointer;
    dax_iomap_iter() ends up directly invoke the function in pmem.c
      which finds pgmap from pfn_t, and (struct pmem *) from
      pgmap->owner;

In this way, we get rid of dax_recovery_write() interface as well as the
dm dance.

What do you think?

Dan, could you also chime in ?

> 
>>   /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
>>   __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>> -		long nr_pages, void **kaddr, pfn_t *pfn)
>> +		long nr_pages, int flags, void **kaddr, pfn_t *pfn)
>>   {
>>   	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>> +	sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT;
>> +	unsigned int num = PFN_PHYS(nr_pages) >> SECTOR_SHIFT;
>> +	struct badblocks *bb = &pmem->bb;
>> +	sector_t first_bad;
>> +	int num_bad;
>> +	bool bad_in_range;
>> +	long actual_nr;
>> +
>> +	if (!bb->count)
>> +		bad_in_range = false;
>> +	else
>> +		bad_in_range = !!badblocks_check(bb, sector, num, &first_bad, &num_bad);
>>   
>> -	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> -					PFN_PHYS(nr_pages))))
>> +	if (bad_in_range && !(flags & DAX_RECOVERY))
>>   		return -EIO;
> 
> The use of bad_in_range here seems a litle convoluted.  See the attached
> patch on how I would structure the function to avoid the variable and
> have the reocvery code in a self-contained chunk.

Much better, will use your version, thanks!

> 
>> -		map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
>> -				&kaddr, NULL);
>> +		nrpg = PHYS_PFN(size);
>> +		map_len = dax_direct_access(dax_dev, pgoff, nrpg, 0, &kaddr, NULL);
> 
> Overly long line here.

Okay, will run the checkpatch.pl test again.

thanks!
-jane
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


WARNING: multiple messages have this Message-ID (diff)
From: Jane Chu <jane.chu@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "david@fromorbit.com" <david@fromorbit.com>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"agk@redhat.com" <agk@redhat.com>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vgoyal@redhat.com" <vgoyal@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write dev_pgmap_ops
Date: Wed, 6 Apr 2022 17:32:31 +0000	[thread overview]
Message-ID: <196d51a3-b3cc-02ae-0d7d-ee6fbb4d50e4@oracle.com> (raw)
In-Reply-To: <Yk0i/pODntZ7lbDo@infradead.org>

On 4/5/2022 10:19 PM, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 01:47:45PM -0600, Jane Chu wrote:
>> Introduce DAX_RECOVERY flag to dax_direct_access(). The flag is
>> not set by default in dax_direct_access() such that the helper
>> does not translate a pmem range to kernel virtual address if the
>> range contains uncorrectable errors.  When the flag is set,
>> the helper ignores the UEs and return kernel virtual adderss so
>> that the caller may get on with data recovery via write.
>>
>> Also introduce a new dev_pagemap_ops .recovery_write function.
>> The function is applicable to FSDAX device only. The device
>> page backend driver provides .recovery_write function if the
>> device has underlying mechanism to clear the uncorrectable
>> errors on the fly.
> 
> I know Dan suggested it, but I still think dev_pagemap_ops is the very
> wrong choice here.  It is about VM callbacks to ZONE_DEVICE owners
> independent of what pagemap type they are.  .recovery_write on the
> other hand is completely specific to the DAX write path and has no
> MM interactions at all.

Yes, I believe Dan was motivated by avoiding the dm dance as a result of
adding .recovery_write to dax_operations.

I understand your point about .recovery_write is device specific and
thus not something appropriate for device agnostic ops.

I can see 2 options so far -

1)  add .recovery_write to dax_operations and do the dm dance to hunt 
down to the base device that actually provides the recovery action

2)  an ugly but expedient approach based on the observation that 
dax_direct_access() has already gone through the dm dance and thus could 
scoop up the .recovery_write function pointer if DAX_RECOVERY flag is 
set.  Like bundle action-flag with action, and if should there need more
device specific actions, just add another action with associated flag.

I'm thinking about something like this

    long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
                           long nr_pages, struct daxdev_specific *action,
                           int flags, void **kaddr, pfn_t *pfn)

    where
    struct daxdev_specific {
	int flags;	/* DAX_RECOVERY, etc */
	size_t (*recovery_write) (pfn_t pfn, pgoff_t pgoff, void *addr,
				 size_t bytes, void *iter);
    }

    __pmem_direct_access() provides the .recovery_write function pointer;
    dax_iomap_iter() ends up directly invoke the function in pmem.c
      which finds pgmap from pfn_t, and (struct pmem *) from
      pgmap->owner;

In this way, we get rid of dax_recovery_write() interface as well as the
dm dance.

What do you think?

Dan, could you also chime in ?

> 
>>   /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
>>   __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>> -		long nr_pages, void **kaddr, pfn_t *pfn)
>> +		long nr_pages, int flags, void **kaddr, pfn_t *pfn)
>>   {
>>   	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>> +	sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT;
>> +	unsigned int num = PFN_PHYS(nr_pages) >> SECTOR_SHIFT;
>> +	struct badblocks *bb = &pmem->bb;
>> +	sector_t first_bad;
>> +	int num_bad;
>> +	bool bad_in_range;
>> +	long actual_nr;
>> +
>> +	if (!bb->count)
>> +		bad_in_range = false;
>> +	else
>> +		bad_in_range = !!badblocks_check(bb, sector, num, &first_bad, &num_bad);
>>   
>> -	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> -					PFN_PHYS(nr_pages))))
>> +	if (bad_in_range && !(flags & DAX_RECOVERY))
>>   		return -EIO;
> 
> The use of bad_in_range here seems a litle convoluted.  See the attached
> patch on how I would structure the function to avoid the variable and
> have the reocvery code in a self-contained chunk.

Much better, will use your version, thanks!

> 
>> -		map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
>> -				&kaddr, NULL);
>> +		nrpg = PHYS_PFN(size);
>> +		map_len = dax_direct_access(dax_dev, pgoff, nrpg, 0, &kaddr, NULL);
> 
> Overly long line here.

Okay, will run the checkpatch.pl test again.

thanks!
-jane

  reply	other threads:[~2022-04-06 17:33 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 19:47 [PATCH v7 0/6] DAX poison recovery Jane Chu
2022-04-05 19:47 ` [dm-devel] " Jane Chu
2022-04-05 19:47 ` [PATCH v7 1/6] x86/mm: fix comment Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-11 22:07   ` Dan Williams
2022-04-11 22:07     ` [dm-devel] " Dan Williams
2022-04-12  9:53   ` Borislav Petkov
2022-04-12  9:53     ` [dm-devel] " Borislav Petkov
2022-04-14  1:00     ` Jane Chu
2022-04-14  1:00       ` [dm-devel] " Jane Chu
2022-04-14  8:44       ` Borislav Petkov
2022-04-14  8:44         ` [dm-devel] " Borislav Petkov
2022-04-14 21:54         ` Jane Chu
2022-04-14 21:54           ` [dm-devel] " Jane Chu
2022-04-05 19:47 ` [PATCH v7 2/6] x86/mce: relocate set{clear}_mce_nospec() functions Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-06  5:01   ` Christoph Hellwig
2022-04-06  5:01     ` [dm-devel] " Christoph Hellwig
2022-04-11 22:20   ` Dan Williams
2022-04-11 22:20     ` [dm-devel] " Dan Williams
2022-04-14  0:56     ` Jane Chu
2022-04-14  0:56       ` [dm-devel] " Jane Chu
2022-04-05 19:47 ` [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-06  5:02   ` Christoph Hellwig
2022-04-06  5:02     ` [dm-devel] " Christoph Hellwig
2022-04-11 23:27   ` Dan Williams
2022-04-11 23:27     ` [dm-devel] " Dan Williams
2022-04-13 23:36     ` Jane Chu
2022-04-13 23:36       ` [dm-devel] " Jane Chu
2022-04-14  2:32       ` Dan Williams
2022-04-14  2:32         ` [dm-devel] " Dan Williams
2022-04-15 16:18         ` Jane Chu
2022-04-15 16:18           ` [dm-devel] " Jane Chu
2022-04-12 10:07   ` Borislav Petkov
2022-04-12 10:07     ` [dm-devel] " Borislav Petkov
2022-04-13 23:41     ` Jane Chu
2022-04-13 23:41       ` [dm-devel] " Jane Chu
2022-04-05 19:47 ` [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write dev_pgmap_ops Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-06  5:19   ` Christoph Hellwig
2022-04-06  5:19     ` Christoph Hellwig
2022-04-06 17:32     ` Jane Chu [this message]
2022-04-06 17:32       ` Jane Chu
2022-04-06 17:45       ` Jane Chu
2022-04-06 17:45         ` [dm-devel] " Jane Chu
2022-04-07  5:30       ` Christoph Hellwig
2022-04-07  5:30         ` [dm-devel] " Christoph Hellwig
2022-04-11 23:55         ` Dan Williams
2022-04-11 23:55           ` [dm-devel] " Dan Williams
2022-04-14  0:48           ` Jane Chu
2022-04-14  0:48             ` [dm-devel] " Jane Chu
2022-04-14  0:47         ` Jane Chu
2022-04-14  0:47           ` [dm-devel] " Jane Chu
2022-04-12  0:08   ` Dan Williams
2022-04-12  0:08     ` [dm-devel] " Dan Williams
2022-04-14  0:50     ` Jane Chu
2022-04-14  0:50       ` [dm-devel] " Jane Chu
2022-04-12  4:57   ` Dan Williams
2022-04-12  4:57     ` [dm-devel] " Dan Williams
2022-04-12  5:02     ` Christoph Hellwig
2022-04-12  5:02       ` [dm-devel] " Christoph Hellwig
2022-04-14  0:51       ` Jane Chu
2022-04-14  0:51         ` [dm-devel] " Jane Chu
2022-04-05 19:47 ` [PATCH v7 5/6] pmem: refactor pmem_clear_poison() Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-06  5:04   ` Christoph Hellwig
2022-04-06  5:04     ` [dm-devel] " Christoph Hellwig
2022-04-06 17:34     ` Jane Chu
2022-04-06 17:34       ` Jane Chu
2022-04-12  4:26   ` Dan Williams
2022-04-12  4:26     ` [dm-devel] " Dan Williams
2022-04-14  0:55     ` Jane Chu
2022-04-14  0:55       ` [dm-devel] " Jane Chu
2022-04-14  2:02       ` Dan Williams
2022-04-14  2:02         ` [dm-devel] " Dan Williams
2022-04-05 19:47 ` [PATCH v7 6/6] pmem: implement pmem_recovery_write() Jane Chu
2022-04-05 19:47   ` [dm-devel] " Jane Chu
2022-04-06  5:21   ` Christoph Hellwig
2022-04-06  5:21     ` [dm-devel] " Christoph Hellwig
2022-04-06 17:33     ` Jane Chu
2022-04-06 17:33       ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=196d51a3-b3cc-02ae-0d7d-ee6fbb4d50e4@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=snitzer@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.