All of lore.kernel.org
 help / color / mirror / Atom feed
* [question] Panic in dax_writeback_one
@ 2021-03-11  7:48 chenjun (AM)
  2021-03-11 12:19 ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: chenjun (AM) @ 2021-03-11  7:48 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Dan Williams, Jan Kara, Xiangrui (Euler), lizhe (Y),
	yangerkun, zhangyi (F)

Hi

I write a driver to simulate memory as a block device (like a ramdisk). 
and I hope the memory used would not be observed by kernel, becasue of 
the struct page will take many memory.

When I mount ext2 with dax on my ramdisk. Panic will happen when fsync.
Call trace:
  dax_writeback_one+0x330/0x3e4
  dax_writeback_mapping_range+0x15c/0x318
  ext2_dax_writepages+0x38/0x44
....

static int dax_writeback_one(struct xa_state *xas, struct dax_device 
*dax_dev, struct address_space *mapping, void *entry)
----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
The pfn is returned by the driver. In my case, the pfn does not have
struct page. so pfn_to_page(pfn) return a wrong address.

I noticed the following changes may related to my problem:
1. Before 3fe0791c295cfd3cd735de7a32cc0780949c009f (dax: store pfns in 
the radix), the radix tree stroes sectors. address which would be 
flushed is got from driver by passing sector.
And the commit assume that all pfn have struct page.

2. CONFIG_FS_DAX_LIMITED (Selected by DCSSBLK [=n] && BLK_DEV [=y] && 
S390 && BLOCK [=y]) is added to avoid access struct page of pfn.

Does anyone have any idea about my problem.

-- 
Regards
Chen Jun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-11  7:48 [question] Panic in dax_writeback_one chenjun (AM)
@ 2021-03-11 12:19 ` Matthew Wilcox
  2021-03-11 17:25   ` Dan Williams
  2021-03-12  2:40   ` chenjun (AM)
  0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-03-11 12:19 UTC (permalink / raw)
  To: chenjun (AM)
  Cc: linux-kernel, linux-fsdevel, Dan Williams, Jan Kara,
	Xiangrui (Euler), lizhe (Y), yangerkun, zhangyi (F)

On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> static int dax_writeback_one(struct xa_state *xas, struct dax_device 
> *dax_dev, struct address_space *mapping, void *entry)
> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> The pfn is returned by the driver. In my case, the pfn does not have
> struct page. so pfn_to_page(pfn) return a wrong address.

I wasn't involved, but I think the right solution here is simply to
replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
know why Dan decided to do this in the more complicated way.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-11 12:19 ` Matthew Wilcox
@ 2021-03-11 17:25   ` Dan Williams
  2021-03-17  2:59     ` chenjun (AM)
  2021-03-12  2:40   ` chenjun (AM)
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2021-03-11 17:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: chenjun (AM),
	linux-kernel, linux-fsdevel, Jan Kara, Xiangrui (Euler),
	lizhe (Y), yangerkun, zhangyi (F)

On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> > static int dax_writeback_one(struct xa_state *xas, struct dax_device
> > *dax_dev, struct address_space *mapping, void *entry)
> > ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> > The pfn is returned by the driver. In my case, the pfn does not have
> > struct page. so pfn_to_page(pfn) return a wrong address.
>
> I wasn't involved, but I think the right solution here is simply to
> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
> know why Dan decided to do this in the more complicated way.

pfn_to_virt() only works for the direct-map. If pages are not mapped I
don't see how pfn_to_virt() is expected to work.

The real question Chenjun is why are you writing a new simulator of
memory as a block-device vs reusing the pmem driver or brd?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-11 12:19 ` Matthew Wilcox
  2021-03-11 17:25   ` Dan Williams
@ 2021-03-12  2:40   ` chenjun (AM)
  1 sibling, 0 replies; 7+ messages in thread
From: chenjun (AM) @ 2021-03-12  2:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, linux-fsdevel, Dan Williams, Jan Kara,
	Xiangrui (Euler), lizhe (Y), yangerkun, zhangyi (F)

在 2021/3/11 20:20, Matthew Wilcox 写道:
> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>> *dax_dev, struct address_space *mapping, void *entry)
>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>> The pfn is returned by the driver. In my case, the pfn does not have
>> struct page. so pfn_to_page(pfn) return a wrong address.
> 
> I wasn't involved, but I think the right solution here is simply to
> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
> know why Dan decided to do this in the more complicated way.
> 

Thanks Mattherw

I think pfn_to_virt could not work in my case. Because of the pfn is not 
in memory block.

-- 
Regards
Chen Jun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-11 17:25   ` Dan Williams
@ 2021-03-17  2:59     ` chenjun (AM)
  2021-03-17  4:55       ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: chenjun (AM) @ 2021-03-17  2:59 UTC (permalink / raw)
  To: Dan Williams, Matthew Wilcox
  Cc: linux-kernel, linux-fsdevel, Jan Kara, Xiangrui (Euler),
	lizhe (Y), yangerkun, zhangyi (F)

在 2021/3/12 1:25, Dan Williams 写道:
> On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>>> *dax_dev, struct address_space *mapping, void *entry)
>>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>>> The pfn is returned by the driver. In my case, the pfn does not have
>>> struct page. so pfn_to_page(pfn) return a wrong address.
>>
>> I wasn't involved, but I think the right solution here is simply to
>> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
>> know why Dan decided to do this in the more complicated way.
> 
> pfn_to_virt() only works for the direct-map. If pages are not mapped I
> don't see how pfn_to_virt() is expected to work.
> 
> The real question Chenjun is why are you writing a new simulator of
> memory as a block-device vs reusing the pmem driver or brd?
> 

Hi Dan

In my case, I do not want to take memory to create the struct page of 
the memory my driver used.

And, I think this is also a problem for DCSSBLK.

So I want to go back the older way if CONFIG_FS_DAX_LIMITED

diff --git a/fs/dax.c b/fs/dax.c
index b3d27fd..6395e84 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas, 
struct dax_device *dax_dev,
  {
  	unsigned long pfn, index, count;
  	long ret = 0;
+	void *kaddr;
+	pfn_t new_pfn_t;
+	pgoff_t pgoff;

  	/*
  	 * A page got tagged dirty in DAX mapping? Something is seriously
@@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas, 
struct dax_device *dax_dev,
  	index = xas->xa_index & ~(count - 1);

  	dax_entry_mkclean(mapping, index, pfn);
-	dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
+
+	if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
+		kaddr = page_address(pfn_to_page(pfn));
+	else {
+		ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn << 
PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);
+		if (ret)
+			goto put_unlocked;
+
+		ret = dax_direct_access(dax_dev, pgoff, count, &kaddr, &new_pfn_t);
+		if (ret < 0)
+			goto put_unlocked;
+
+		if (WARN_ON_ONCE(ret < count) || WARN_ON_ONCE(pfn_t_to_pfn(new_pfn_t) 
!= pfn)) {
+			ret = -EIO;
+		        goto put_unlocked;
+		}
+	}
+
+	dax_flush(dax_dev, kaddr, count * PAGE_SIZE);
  	/*
  	 * After we have flushed the cache, we can clear the dirty tag. There
  	 * cannot be new dirty data in the pfn after the flush has completed as
-- 

-- 
Regards
Chen Jun

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-17  2:59     ` chenjun (AM)
@ 2021-03-17  4:55       ` Dan Williams
  2021-03-17  6:45         ` chenjun (AM)
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2021-03-17  4:55 UTC (permalink / raw)
  To: chenjun (AM)
  Cc: Matthew Wilcox, linux-kernel, linux-fsdevel, Jan Kara,
	Xiangrui (Euler), lizhe (Y), yangerkun, zhangyi (F),
	Joao Martins

On Tue, Mar 16, 2021 at 8:00 PM chenjun (AM) <chenjun102@huawei.com> wrote:
>
> 在 2021/3/12 1:25, Dan Williams 写道:
> > On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> >>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
> >>> *dax_dev, struct address_space *mapping, void *entry)
> >>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> >>> The pfn is returned by the driver. In my case, the pfn does not have
> >>> struct page. so pfn_to_page(pfn) return a wrong address.
> >>
> >> I wasn't involved, but I think the right solution here is simply to
> >> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
> >> know why Dan decided to do this in the more complicated way.
> >
> > pfn_to_virt() only works for the direct-map. If pages are not mapped I
> > don't see how pfn_to_virt() is expected to work.
> >
> > The real question Chenjun is why are you writing a new simulator of
> > memory as a block-device vs reusing the pmem driver or brd?
> >
>
> Hi Dan
>
> In my case, I do not want to take memory to create the struct page of
> the memory my driver used.

There are efforts happening to drastically reduce that overhead. You
might want to check out Joao's work [1]. I think that direction holds
more promise than trying to extend FS_DAX_LIMITED.

[1]: http://lore.kernel.org/r/20201208172901.17384-1-joao.m.martins@oracle.com

> And, I think this is also a problem for DCSSBLK.

If I understand correctly DAX replaced XIP for S390. There have not
been reports about this problem, and I can only guess because XIP
(eXecute-In-Place) is a read-only use case where dax_writeback_one()
is never triggered, or S390 just isn't using DCSSBLK anymore. The last
time I touched FS_DAX_LIMITED the DCSSBLK maintainers offered to just
delete this driver to get it out of the way.

>
> So I want to go back the older way if CONFIG_FS_DAX_LIMITED
>
> diff --git a/fs/dax.c b/fs/dax.c
> index b3d27fd..6395e84 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas,
> struct dax_device *dax_dev,
>   {
>         unsigned long pfn, index, count;
>         long ret = 0;
> +       void *kaddr;
> +       pfn_t new_pfn_t;
> +       pgoff_t pgoff;
>
>         /*
>          * A page got tagged dirty in DAX mapping? Something is seriously
> @@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas,
> struct dax_device *dax_dev,
>         index = xas->xa_index & ~(count - 1);
>
>         dax_entry_mkclean(mapping, index, pfn);
> -       dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> +
> +       if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
> +               kaddr = page_address(pfn_to_page(pfn));
> +       else {
> +               ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn <<
> PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);

This is broken:

    mapping->host->i_sb->s_bdev

...there is no guarantee that the superblock associated with the
mapping is hosted on the same block device associated with the passed
in dax_device. See dax_rtdev in xfs_open_devices().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [question] Panic in dax_writeback_one
  2021-03-17  4:55       ` Dan Williams
@ 2021-03-17  6:45         ` chenjun (AM)
  0 siblings, 0 replies; 7+ messages in thread
From: chenjun (AM) @ 2021-03-17  6:45 UTC (permalink / raw)
  To: Dan Williams
  Cc: Matthew Wilcox, linux-kernel, linux-fsdevel, Jan Kara,
	Xiangrui (Euler), lizhe (Y), yangerkun, zhangyi (F),
	Joao Martins

Thanks for the advices. I will check out that.

在 2021/3/17 12:55, Dan Williams 写道:
> On Tue, Mar 16, 2021 at 8:00 PM chenjun (AM) <chenjun102@huawei.com> wrote:
>>
>> 在 2021/3/12 1:25, Dan Williams 写道:
>>> On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>>>>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>>>>> *dax_dev, struct address_space *mapping, void *entry)
>>>>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>>>>> The pfn is returned by the driver. In my case, the pfn does not have
>>>>> struct page. so pfn_to_page(pfn) return a wrong address.
>>>>
>>>> I wasn't involved, but I think the right solution here is simply to
>>>> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn).  I don't
>>>> know why Dan decided to do this in the more complicated way.
>>>
>>> pfn_to_virt() only works for the direct-map. If pages are not mapped I
>>> don't see how pfn_to_virt() is expected to work.
>>>
>>> The real question Chenjun is why are you writing a new simulator of
>>> memory as a block-device vs reusing the pmem driver or brd?
>>>
>>
>> Hi Dan
>>
>> In my case, I do not want to take memory to create the struct page of
>> the memory my driver used.
> 
> There are efforts happening to drastically reduce that overhead. You
> might want to check out Joao's work [1]. I think that direction holds
> more promise than trying to extend FS_DAX_LIMITED.
> 
> [1]: http://lore.kernel.org/r/20201208172901.17384-1-joao.m.martins@oracle.com
> 
>> And, I think this is also a problem for DCSSBLK.
> 
> If I understand correctly DAX replaced XIP for S390. There have not
> been reports about this problem, and I can only guess because XIP
> (eXecute-In-Place) is a read-only use case where dax_writeback_one()
> is never triggered, or S390 just isn't using DCSSBLK anymore. The last
> time I touched FS_DAX_LIMITED the DCSSBLK maintainers offered to just
> delete this driver to get it out of the way.
> 
>>
>> So I want to go back the older way if CONFIG_FS_DAX_LIMITED
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index b3d27fd..6395e84 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas,
>> struct dax_device *dax_dev,
>>    {
>>          unsigned long pfn, index, count;
>>          long ret = 0;
>> +       void *kaddr;
>> +       pfn_t new_pfn_t;
>> +       pgoff_t pgoff;
>>
>>          /*
>>           * A page got tagged dirty in DAX mapping? Something is seriously
>> @@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas,
>> struct dax_device *dax_dev,
>>          index = xas->xa_index & ~(count - 1);
>>
>>          dax_entry_mkclean(mapping, index, pfn);
>> -       dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>> +
>> +       if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
>> +               kaddr = page_address(pfn_to_page(pfn));
>> +       else {
>> +               ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn <<
>> PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);
> 
> This is broken:
> 
>      mapping->host->i_sb->s_bdev
> 
> ...there is no guarantee that the superblock associated with the
> mapping is hosted on the same block device associated with the passed
> in dax_device. See dax_rtdev in xfs_open_devices().
> 


-- 
Regards
Chen Jun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-17  6:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11  7:48 [question] Panic in dax_writeback_one chenjun (AM)
2021-03-11 12:19 ` Matthew Wilcox
2021-03-11 17:25   ` Dan Williams
2021-03-17  2:59     ` chenjun (AM)
2021-03-17  4:55       ` Dan Williams
2021-03-17  6:45         ` chenjun (AM)
2021-03-12  2:40   ` chenjun (AM)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.