All of lore.kernel.org
 help / color / mirror / Atom feed
* question about punch hole
@ 2011-08-26  2:53 Yongqiang Yang
  2011-08-26 22:35 ` Allison Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Yongqiang Yang @ 2011-08-26  2:53 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Ext4 Developers List

Hi Allison,

Currently, punch hole flushes all pages to disk and releases pages in
page cache, and then calls ext4_ext_map_blocks.

Assume that if a new page in the punching's range is mapped after
releasing pages and before down_write i_data_sem,
then ext4_ext_map_blocks will release map info of the page in extent
tree.  However, up layers does not know this, and they think the page
is mapped.

I can not find how punch hole handle the situation above.  Could you
shed a light on it?


-- 
Best Wishes
Yongqiang Yang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about punch hole
  2011-08-26  2:53 question about punch hole Yongqiang Yang
@ 2011-08-26 22:35 ` Allison Henderson
  2011-08-27  9:04   ` Yongqiang Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Allison Henderson @ 2011-08-26 22:35 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Ext4 Developers List

On 08/25/2011 07:53 PM, Yongqiang Yang wrote:
> Hi Allison,
>
> Currently, punch hole flushes all pages to disk and releases pages in
> page cache, and then calls ext4_ext_map_blocks.
>
> Assume that if a new page in the punching's range is mapped after
> releasing pages and before down_write i_data_sem,
> then ext4_ext_map_blocks will release map info of the page in extent
> tree.  However, up layers does not know this, and they think the page
> is mapped.
>
> I can not find how punch hole handle the situation above.  Could you
> shed a light on it?
>
>
Hi Yongqiang

This is a really good question and at the moment Im still looking into 
it.  :)  The calling sequence in punch hole was modeled after truncate, 
which also only locks i_data_sem when modifying the extent tree. 
ext4_ext_map_blocks when called with the punch hole flag, only releases 
blocks in the extent tree, using the same routines truncate does, but it 
does not modify the state of the pages. Though that still does not 
prevent the race condition you describe, so I am still investigating it.
I've found that I can catch a lot of race conditions by simply running 
the stress test over night, and so far I havnt had anything like this 
come up, but that certainly doesnt mean its not there.  I will let you 
know what I find.  Thx!

Allison Henderson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about punch hole
  2011-08-26 22:35 ` Allison Henderson
@ 2011-08-27  9:04   ` Yongqiang Yang
  2011-08-27  9:33     ` Yongqiang Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Yongqiang Yang @ 2011-08-27  9:04 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Ext4 Developers List

On Sat, Aug 27, 2011 at 6:35 AM, Allison Henderson
<achender@linux.vnet.ibm.com> wrote:
> On 08/25/2011 07:53 PM, Yongqiang Yang wrote:
>>
>> Hi Allison,
>>
>> Currently, punch hole flushes all pages to disk and releases pages in
>> page cache, and then calls ext4_ext_map_blocks.
>>
>> Assume that if a new page in the punching's range is mapped after
>> releasing pages and before down_write i_data_sem,
>> then ext4_ext_map_blocks will release map info of the page in extent
>> tree.  However, up layers does not know this, and they think the page
>> is mapped.
>>
>> I can not find how punch hole handle the situation above.  Could you
>> shed a light on it?
>>
>>
> Hi Yongqiang
>
> This is a really good question and at the moment Im still looking into it.
>  :)  The calling sequence in punch hole was modeled after truncate, which
> also only locks i_data_sem when modifying the extent tree.
> ext4_ext_map_blocks when called with the punch hole flag, only releases
> blocks in the extent tree, using the same routines truncate does, but it
> does not modify the state of the pages. Though that still does not prevent
> the race condition you describe, so I am still investigating it.
> I've found that I can catch a lot of race conditions by simply running the
> stress test over night, and so far I havnt had anything like this come up,
> but that certainly doesnt mean its not there.  I will let you know what I
> find.  Thx!

Hi Allison,

I had a look at truncate code, truncates and writes are serialized by
inode->i_mutex in vfs layer,  but fallocate does not take i_mutex, so
we need to take i_mutex in punching hole as well, I think.  Fallocate
behaves differently with punching hole, so it is safe without taking
i_mutex.


What's your opinion?

Yongqiang.
>
> Allison Henderson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about punch hole
  2011-08-27  9:04   ` Yongqiang Yang
@ 2011-08-27  9:33     ` Yongqiang Yang
  2011-08-28  1:09       ` Allison Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Yongqiang Yang @ 2011-08-27  9:33 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Ext4 Developers List

On Sat, Aug 27, 2011 at 5:04 PM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote:
> On Sat, Aug 27, 2011 at 6:35 AM, Allison Henderson
> <achender@linux.vnet.ibm.com> wrote:
>> On 08/25/2011 07:53 PM, Yongqiang Yang wrote:
>>>
>>> Hi Allison,
>>>
>>> Currently, punch hole flushes all pages to disk and releases pages in
>>> page cache, and then calls ext4_ext_map_blocks.
>>>
>>> Assume that if a new page in the punching's range is mapped after
>>> releasing pages and before down_write i_data_sem,
>>> then ext4_ext_map_blocks will release map info of the page in extent
>>> tree.  However, up layers does not know this, and they think the page
>>> is mapped.
>>>
>>> I can not find how punch hole handle the situation above.  Could you
>>> shed a light on it?
>>>
>>>
>> Hi Yongqiang
>>
>> This is a really good question and at the moment Im still looking into it.
>>  :)  The calling sequence in punch hole was modeled after truncate, which
>> also only locks i_data_sem when modifying the extent tree.
>> ext4_ext_map_blocks when called with the punch hole flag, only releases
>> blocks in the extent tree, using the same routines truncate does, but it
>> does not modify the state of the pages. Though that still does not prevent
>> the race condition you describe, so I am still investigating it.
>> I've found that I can catch a lot of race conditions by simply running the
>> stress test over night, and so far I havnt had anything like this come up,
>> but that certainly doesnt mean its not there.  I will let you know what I
>> find.  Thx!
>
> Hi Allison,
>
> I had a look at truncate code, truncates and writes are serialized by
> inode->i_mutex in vfs layer,  but fallocate does not take i_mutex, so
> we need to take i_mutex in punching hole as well, I think.  Fallocate
> behaves differently with punching hole, so it is safe without taking
> i_mutex.
It seems that race exists between reads and punching hole as well.  If
a read comes after releasing pages and before down_write(i_data_sem),
then a page will be mapped, if the page is written later, it will
introduce an error. truncate avoids this situation by set file size
before truncating pages.

Yongqiang.

>
>
> What's your opinion?
>
> Yongqiang.
>>
>> Allison Henderson
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about punch hole
  2011-08-27  9:33     ` Yongqiang Yang
@ 2011-08-28  1:09       ` Allison Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Allison Henderson @ 2011-08-28  1:09 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: Ext4 Developers List

On 08/27/2011 02:33 AM, Yongqiang Yang wrote:
> On Sat, Aug 27, 2011 at 5:04 PM, Yongqiang Yang<xiaoqiangnk@gmail.com>  wrote:
>> On Sat, Aug 27, 2011 at 6:35 AM, Allison Henderson
>> <achender@linux.vnet.ibm.com>  wrote:
>>> On 08/25/2011 07:53 PM, Yongqiang Yang wrote:
>>>>
>>>> Hi Allison,
>>>>
>>>> Currently, punch hole flushes all pages to disk and releases pages in
>>>> page cache, and then calls ext4_ext_map_blocks.
>>>>
>>>> Assume that if a new page in the punching's range is mapped after
>>>> releasing pages and before down_write i_data_sem,
>>>> then ext4_ext_map_blocks will release map info of the page in extent
>>>> tree.  However, up layers does not know this, and they think the page
>>>> is mapped.
>>>>
>>>> I can not find how punch hole handle the situation above.  Could you
>>>> shed a light on it?
>>>>
>>>>
>>> Hi Yongqiang
>>>
>>> This is a really good question and at the moment Im still looking into it.
>>>   :)  The calling sequence in punch hole was modeled after truncate, which
>>> also only locks i_data_sem when modifying the extent tree.
>>> ext4_ext_map_blocks when called with the punch hole flag, only releases
>>> blocks in the extent tree, using the same routines truncate does, but it
>>> does not modify the state of the pages. Though that still does not prevent
>>> the race condition you describe, so I am still investigating it.
>>> I've found that I can catch a lot of race conditions by simply running the
>>> stress test over night, and so far I havnt had anything like this come up,
>>> but that certainly doesnt mean its not there.  I will let you know what I
>>> find.  Thx!
>>
>> Hi Allison,
>>
>> I had a look at truncate code, truncates and writes are serialized by
>> inode->i_mutex in vfs layer,  but fallocate does not take i_mutex, so
>> we need to take i_mutex in punching hole as well, I think.  Fallocate
>> behaves differently with punching hole, so it is safe without taking
>> i_mutex.
> It seems that race exists between reads and punching hole as well.  If
> a read comes after releasing pages and before down_write(i_data_sem),
> then a page will be mapped, if the page is written later, it will
> introduce an error. truncate avoids this situation by set file size
> before truncating pages.
>
> Yongqiang.
>

Hi Yongqiang,

Alrighty, I found the code for truncate that you are referring to and 
what you are saying makes a lot of sense, so I will add a fix for it in 
the punch hole patch set I am working on at the moment.  Thx for finding 
this one for me  :)

Allison Henderson

>>
>>
>> What's your opinion?
>>
>> Yongqiang.
>>>
>>> Allison Henderson
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Best Wishes
>> Yongqiang Yang
>>
>
>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-28  1:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26  2:53 question about punch hole Yongqiang Yang
2011-08-26 22:35 ` Allison Henderson
2011-08-27  9:04   ` Yongqiang Yang
2011-08-27  9:33     ` Yongqiang Yang
2011-08-28  1:09       ` Allison Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.