[Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
@ 2016-09-10  9:55 Eric Ren
  2016-09-12  1:37 ` Joseph Qi
  2016-09-12  2:03 ` Gang He
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Ren @ 2016-09-10  9:55 UTC (permalink / raw)
  To: ocfs2-devel

The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally.

In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE)
and then ftruncate(fd, CLUSTER_SIZE) again and again.

This is the backtrace when the deadlock happens:
[<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0
[<ffffffff81199bd7>] __lock_page+0xb7/0xc0
[<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40
[<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
[<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
[<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
[<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0
[<ffffffff811d3635>] handle_mm_fault+0x685/0x1350
[<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530
[<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0
[<ffffffff81069827>] trace_do_page_fault+0x37/0xf0
[<ffffffff81061e69>] do_async_page_fault+0x19/0x70
[<ffffffff8170ac98>] async_page_fault+0x28/0x30

In ocfs2_write_begin_nolock(), we first grab the pages and then
allocate disk space for this write; ocfs2_try_to_free_truncate_log()
will be called if ENOSPC is turned; if we're lucky to get enough clusters,
which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
the target page isn't unlocked, so we will deadlock when trying to grab
the target page again.

Fix this issue by unlocking the target page after we fail to allocate
enough space at the first time.

Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.

Signed-off-by: Eric Ren <zren@suse.com>
---
 fs/ocfs2/aops.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 98d3654..78d1d67 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1860,6 +1860,13 @@ out:
 		 */
 		try_free = 0;

+		/*
+		 * Unlock mmap_page because the page has been locked when we
+		 * are here.
+		 */
+		if (mmap_page)
+			unlock_page(mmap_page);
+
 		ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
 		if (ret1 == 1)
 			goto try_again;
-- 
2.6.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-10  9:55 [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() Eric Ren
@ 2016-09-12  1:37 ` Joseph Qi
  2016-09-12  2:04   ` Eric Ren
  2016-09-12  2:03 ` Gang He
  1 sibling, 1 reply; 10+ messages in thread
From: Joseph Qi @ 2016-09-12  1:37 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,

On 2016/9/10 17:55, Eric Ren wrote:
> The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally.
> 
> In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
> there are 2 process repeatedly performing the following operations
> respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
> 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE)
> and then ftruncate(fd, CLUSTER_SIZE) again and again.
> 
> This is the backtrace when the deadlock happens:
> [<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0
> [<ffffffff81199bd7>] __lock_page+0xb7/0xc0
> [<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40
> [<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
> [<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
> [<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
> [<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0
> [<ffffffff811d3635>] handle_mm_fault+0x685/0x1350
> [<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530
> [<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0
> [<ffffffff81069827>] trace_do_page_fault+0x37/0xf0
> [<ffffffff81061e69>] do_async_page_fault+0x19/0x70
> [<ffffffff8170ac98>] async_page_fault+0x28/0x30
> 
> In ocfs2_write_begin_nolock(), we first grab the pages and then
> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
> the target page isn't unlocked, so we will deadlock when trying to grab
> the target page again.
IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
w_target_locked is set to true, and then will be unlocked by
ocfs2_unlock_pages in ocfs2_free_write_ctxt.
So I'm not getting the case "page isn't unlock". Could you please explain
it in more detail?

Thanks,
Joseph

> 
> Fix this issue by unlocking the target page after we fail to allocate
> enough space at the first time.
> 
> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
> 
> Signed-off-by: Eric Ren <zren@suse.com>
> ---
>  fs/ocfs2/aops.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 98d3654..78d1d67 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -1860,6 +1860,13 @@ out:
>  		 */
>  		try_free = 0;
>  
> +		/*
> +		 * Unlock mmap_page because the page has been locked when we
> +		 * are here.
> +		 */
> +		if (mmap_page)
> +			unlock_page(mmap_page);
> +
>  		ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>  		if (ret1 == 1)
>  			goto try_again;
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-10  9:55 [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() Eric Ren
  2016-09-12  1:37 ` Joseph Qi
@ 2016-09-12  2:03 ` Gang He
  1 sibling, 0 replies; 10+ messages in thread
From: Gang He @ 2016-09-12  2:03 UTC (permalink / raw)
  To: ocfs2-devel

Reviewed-by: Gang He <ghe@suse.com>

Thanks
Gang


>>> 
> The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally.
> 
> In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
> there are 2 process repeatedly performing the following operations
> respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
> 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE)
> and then ftruncate(fd, CLUSTER_SIZE) again and again.
> 
> This is the backtrace when the deadlock happens:
> [<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0
> [<ffffffff81199bd7>] __lock_page+0xb7/0xc0
> [<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40
> [<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
> [<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
> [<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
> [<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0
> [<ffffffff811d3635>] handle_mm_fault+0x685/0x1350
> [<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530
> [<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0
> [<ffffffff81069827>] trace_do_page_fault+0x37/0xf0
> [<ffffffff81061e69>] do_async_page_fault+0x19/0x70
> [<ffffffff8170ac98>] async_page_fault+0x28/0x30
> 
> In ocfs2_write_begin_nolock(), we first grab the pages and then
> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
> which is usually the case, we start over again. But in 
> ocfs2_free_write_ctxt()
> the target page isn't unlocked, so we will deadlock when trying to grab
> the target page again.
> 
> Fix this issue by unlocking the target page after we fail to allocate
> enough space at the first time.
> 
> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
> cause.
> 
> Signed-off-by: Eric Ren <zren@suse.com>
> ---
>  fs/ocfs2/aops.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 98d3654..78d1d67 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -1860,6 +1860,13 @@ out:
>  		 */
>  		try_free = 0;
>  
> +		/*
> +		 * Unlock mmap_page because the page has been locked when we
> +		 * are here.
> +		 */
> +		if (mmap_page)
> +			unlock_page(mmap_page);
> +
>  		ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>  		if (ret1 == 1)
>  			goto try_again;
> -- 
> 2.6.6
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-12  1:37 ` Joseph Qi
@ 2016-09-12  2:04   ` Eric Ren
  2016-09-12  3:06     ` Eric Ren
  2016-09-14  8:04     ` Eric Ren
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Ren @ 2016-09-12  2:04 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

On 09/12/2016 09:37 AM, Joseph Qi wrote:
> Hi Eric,
>
> On 2016/9/10 17:55, Eric Ren wrote:
>> The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally.
>>
>> In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
>> there are 2 process repeatedly performing the following operations
>> respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1,
>> 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE)
>> and then ftruncate(fd, CLUSTER_SIZE) again and again.
>>
>> This is the backtrace when the deadlock happens:
>> [<ffffffff817054f0>] __wait_on_bit_lock+0x50/0xa0
>> [<ffffffff81199bd7>] __lock_page+0xb7/0xc0
>> [<ffffffff810c4de0>] ? autoremove_wake_function+0x40/0x40
>> [<ffffffffa0440f4f>] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
>> [<ffffffffa0462a50>] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
>> [<ffffffffa0467b47>] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
>> [<ffffffff811cf286>] do_page_mkwrite+0x66/0xc0
>> [<ffffffff811d3635>] handle_mm_fault+0x685/0x1350
>> [<ffffffff81039dc0>] ? __fpu__restore_sig+0x70/0x530
>> [<ffffffff810694c8>] __do_page_fault+0x1d8/0x4d0
>> [<ffffffff81069827>] trace_do_page_fault+0x37/0xf0
>> [<ffffffff81061e69>] do_async_page_fault+0x19/0x70
>> [<ffffffff8170ac98>] async_page_fault+0x28/0x30
>>
>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
>> the target page isn't unlocked, so we will deadlock when trying to grab
>> the target page again.
> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
> w_target_locked is set to true, and then will be unlocked by
> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
> So I'm not getting the case "page isn't unlock". Could you please explain
> it in more detail?
Thanks for review;-) Follow up the calling chain:

ocfs2_free_write_ctxt()
  ->ocfs2_unlock_pages()

in ocfs2_unlock_pages (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
can see the code just put_page(target_page), but not unlock it.

Yeah, I will think this a bit more like:
why not unlock the target_page there? Is there other potential problems if the "ret" is not 
"-ENOSPC" but
other possible error code?

Thanks,
Eric

>
> Thanks,
> Joseph
>
>> Fix this issue by unlocking the target page after we fail to allocate
>> enough space at the first time.
>>
>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>
>> Signed-off-by: Eric Ren <zren@suse.com>
>> ---
>>   fs/ocfs2/aops.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index 98d3654..78d1d67 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -1860,6 +1860,13 @@ out:
>>   		 */
>>   		try_free = 0;
>>   
>> +		/*
>> +		 * Unlock mmap_page because the page has been locked when we
>> +		 * are here.
>> +		 */
>> +		if (mmap_page)
>> +			unlock_page(mmap_page);
>> +
>>   		ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>   		if (ret1 == 1)
>>   			goto try_again;
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160912/73d4d6f7/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-12  2:04   ` Eric Ren
@ 2016-09-12  3:06     ` Eric Ren
  2016-09-14  8:08       ` Eric Ren
  2016-09-14  8:04     ` Eric Ren
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Ren @ 2016-09-12  3:06 UTC (permalink / raw)
  To: ocfs2-devel

Hi,
>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>> w_target_locked is set to true, and then will be unlocked by
>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>> So I'm not getting the case "page isn't unlock". Could you please explain
>> it in more detail?
> Thanks for review;-) Follow up the calling chain:
>
> ocfs2_free_write_ctxt()
>  ->ocfs2_unlock_pages()
>
> in ocfs2_unlock_pages 
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
> can see the code just put_page(target_page), but not unlock it.
>
> Yeah, I will think this a bit more like:
> why not unlock the target_page there? Is there other potential problems if the "ret" is 
> not "-ENOSPC" but
> other possible error code?
1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this case, we 
definitely want to return a locked mmaped page
to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set.

2. But there's indeed a potential existing deadlock situation:
                                ocfs2_grab_pages_for_write()  ==> return -ENOMEM and with 
the mmaped page locked
                                ocfs2_free_write_ctxt() ==> leave the mmapped page locked
                          ocfs2_write_begin_nolock() ==> return -ENOMEM
                    __ocfs2_page_mkwrite() ==> return VM_FAULT_OMM
              __do_page_mkwrite() ==> deadlock here 
(https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054)
This is another corner case, right?

Anyway, I think this patch is good for the -ENOSPC case. And another patch should be 
proposed for -ENOMEM case?

Thanks,
Eric

>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>
>>> Fix this issue by unlocking the target page after we fail to allocate
>>> enough space at the first time.
>>>
>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>
>>> Signed-off-by: Eric Ren <zren@suse.com>
>>> ---
>>>   fs/ocfs2/aops.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 98d3654..78d1d67 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1860,6 +1860,13 @@ out:
>>>            */
>>>           try_free = 0;
>>>   +        /*
>>> +         * Unlock mmap_page because the page has been locked when we
>>> +         * are here.
>>> +         */
>>> +        if (mmap_page)
>>> +            unlock_page(mmap_page);
>>> +
>>>           ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>           if (ret1 == 1)
>>>               goto try_again;
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-12  2:04   ` Eric Ren
  2016-09-12  3:06     ` Eric Ren
@ 2016-09-14  8:04     ` Eric Ren
  2016-09-14  8:25       ` Joseph Qi
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Ren @ 2016-09-14  8:04 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,
>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>> the target page again.
>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>> w_target_locked is set to true, and then will be unlocked by
>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>> So I'm not getting the case "page isn't unlock". Could you please explain
>> it in more detail?
> Thanks for review;-) Follow up the calling chain:
>
> ocfs2_free_write_ctxt()
>  ->ocfs2_unlock_pages()
>
> in ocfs2_unlock_pages 
> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
> can see the code just put_page(target_page), but not unlock it.
Did this answer your question?

Thanks,
Eric
>
> Yeah, I will think this a bit more like:
> why not unlock the target_page there? Is there other potential problems if the "ret" is 
> not "-ENOSPC" but
> other possible error code?
>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>
>>> Fix this issue by unlocking the target page after we fail to allocate
>>> enough space at the first time.
>>>
>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>
>>> Signed-off-by: Eric Ren <zren@suse.com>
>>> ---
>>>   fs/ocfs2/aops.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index 98d3654..78d1d67 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -1860,6 +1860,13 @@ out:
>>>            */
>>>           try_free = 0;
>>>   +        /*
>>> +         * Unlock mmap_page because the page has been locked when we
>>> +         * are here.
>>> +         */
>>> +        if (mmap_page)
>>> +            unlock_page(mmap_page);
>>> +
>>>           ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>           if (ret1 == 1)
>>>               goto try_again;
>>>
>>
>>
>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160914/f04177a5/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-12  3:06     ` Eric Ren
@ 2016-09-14  8:08       ` Eric Ren
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Ren @ 2016-09-14  8:08 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

On 09/12/2016 11:06 AM, Eric Ren wrote:
> Hi,
>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>> w_target_locked is set to true, and then will be unlocked by
>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>> it in more detail?
>> Thanks for review;-) Follow up the calling chain:
>>
>> ocfs2_free_write_ctxt()
>>   ->ocfs2_unlock_pages()
>>
>> in ocfs2_unlock_pages
>> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>> can see the code just put_page(target_page), but not unlock it.
>>
>> Yeah, I will think this a bit more like:
>> why not unlock the target_page there? Is there other potential problems if the "ret" is
>> not "-ENOSPC" but
>> other possible error code?
> 1. ocfs2_unlock_pages() will be called in ocfs2_write_end_nolock(), in this case, we
> definitely want to return a locked mmaped page
> to VM code (do_page_mkwrite) when VM_FAULT_LOCKED is set.
>
> 2. But there's indeed a potential existing deadlock situation:
>                                  ocfs2_grab_pages_for_write()  ==> return -ENOMEM and with
> the mmaped page locked
>                                  ocfs2_free_write_ctxt() ==> leave the mmapped page locked
>                            ocfs2_write_begin_nolock() ==> return -ENOMEM
>                      __ocfs2_page_mkwrite() ==> return VM_FAULT_OMM
>                __do_page_mkwrite() ==> deadlock here
> (https://github.com/torvalds/linux/blob/master/mm/memory.c#L2054)
> This is another corner case, right?
>
> Anyway, I think this patch is good for the -ENOSPC case. And another patch should be
> proposed for -ENOMEM case?
Yes, I think we can catch both -ENOSPC and -ENOMEM cases in the failure path by unlocking the
mmaped page after ocfs2_free_write_ctx(), right?

Eric
>
> Thanks,
> Eric
>
>> Thanks,
>> Eric
>>
>>> Thanks,
>>> Joseph
>>>
>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>> enough space at the first time.
>>>>
>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>>
>>>> Signed-off-by: Eric Ren <zren@suse.com>
>>>> ---
>>>>    fs/ocfs2/aops.c | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 98d3654..78d1d67 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -1860,6 +1860,13 @@ out:
>>>>             */
>>>>            try_free = 0;
>>>>    +        /*
>>>> +         * Unlock mmap_page because the page has been locked when we
>>>> +         * are here.
>>>> +         */
>>>> +        if (mmap_page)
>>>> +            unlock_page(mmap_page);
>>>> +
>>>>            ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>            if (ret1 == 1)
>>>>                goto try_again;
>>>>
>>>
>>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-14  8:04     ` Eric Ren
@ 2016-09-14  8:25       ` Joseph Qi
  2016-09-14  8:43         ` Eric Ren
  0 siblings, 1 reply; 10+ messages in thread
From: Joseph Qi @ 2016-09-14  8:25 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,
Sorry for the delayed response.
I have got your explanation. So we have to unlock the page only in case
of retry, right?
If so, I think the unlock should be right before "goto try_again".

Thanks,
Joseph

On 2016/9/14 16:04, Eric Ren wrote:
> Hi Joseph,
>>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>>>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
>>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>>> the target page again.
>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>> w_target_locked is set to true, and then will be unlocked by
>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>> it in more detail?
>> Thanks for review;-) Follow up the calling chain:
>>
>> ocfs2_free_write_ctxt()
>>  ->ocfs2_unlock_pages()
>>
>> in ocfs2_unlock_pages (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>> can see the code just put_page(target_page), but not unlock it.
> Did this answer your question?
> 
> Thanks,
> Eric
>>
>> Yeah, I will think this a bit more like:
>> why not unlock the target_page there? Is there other potential problems if the "ret" is not "-ENOSPC" but
>> other possible error code?
>>
>> Thanks,
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>> enough space at the first time.
>>>>
>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>>
>>>> Signed-off-by: Eric Ren <zren@suse.com>
>>>> ---
>>>>   fs/ocfs2/aops.c | 7 +++++++
>>>>   1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index 98d3654..78d1d67 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -1860,6 +1860,13 @@ out:
>>>>            */
>>>>           try_free = 0;
>>>>   +        /*
>>>> +         * Unlock mmap_page because the page has been locked when we
>>>> +         * are here.
>>>> +         */
>>>> +        if (mmap_page)
>>>> +            unlock_page(mmap_page);
>>>> +
>>>>           ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>           if (ret1 == 1)
>>>>               goto try_again;
>>>>
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-14  8:25       ` Joseph Qi
@ 2016-09-14  8:43         ` Eric Ren
  2016-09-14  9:05           ` Joseph Qi
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Ren @ 2016-09-14  8:43 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

On 09/14/2016 04:25 PM, Joseph Qi wrote:
> Hi Eric,
> Sorry for the delayed response.
> I have got your explanation. So we have to unlock the page only in case
> of retry, right?
> If so, I think the unlock should be right before "goto try_again".
No, the mmapped page should be unlocked as long as we cannot return VM_FAULT_LOCKED
to do_page_mkpage(). Otherwise, the deadlock will happen in do_page_mkpage(). Please
see the recent 2 mails;-)

Eric
>
> Thanks,
> Joseph
>
> On 2016/9/14 16:04, Eric Ren wrote:
>> Hi Joseph,
>>>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>>>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>>>>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
>>>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>>>> the target page again.
>>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>>> w_target_locked is set to true, and then will be unlocked by
>>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>>> it in more detail?
>>> Thanks for review;-) Follow up the calling chain:
>>>
>>> ocfs2_free_write_ctxt()
>>>   ->ocfs2_unlock_pages()
>>>
>>> in ocfs2_unlock_pages (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>>> can see the code just put_page(target_page), but not unlock it.
>> Did this answer your question?
>>
>> Thanks,
>> Eric
>>> Yeah, I will think this a bit more like:
>>> why not unlock the target_page there? Is there other potential problems if the "ret" is not "-ENOSPC" but
>>> other possible error code?
>>>
>>> Thanks,
>>> Eric
>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>>> enough space at the first time.
>>>>>
>>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>>>
>>>>> Signed-off-by: Eric Ren <zren@suse.com>
>>>>> ---
>>>>>    fs/ocfs2/aops.c | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index 98d3654..78d1d67 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -1860,6 +1860,13 @@ out:
>>>>>             */
>>>>>            try_free = 0;
>>>>>    +        /*
>>>>> +         * Unlock mmap_page because the page has been locked when we
>>>>> +         * are here.
>>>>> +         */
>>>>> +        if (mmap_page)
>>>>> +            unlock_page(mmap_page);
>>>>> +
>>>>>            ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>>            if (ret1 == 1)
>>>>>                goto try_again;
>>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
  2016-09-14  8:43         ` Eric Ren
@ 2016-09-14  9:05           ` Joseph Qi
  0 siblings, 0 replies; 10+ messages in thread
From: Joseph Qi @ 2016-09-14  9:05 UTC (permalink / raw)
  To: ocfs2-devel

Okay, IC.
So we have to take care of all errors for ocfs2_write_begin_nolock.

On 2016/9/14 16:43, Eric Ren wrote:
> Hi Joseph,
> 
> On 09/14/2016 04:25 PM, Joseph Qi wrote:
>> Hi Eric,
>> Sorry for the delayed response.
>> I have got your explanation. So we have to unlock the page only in case
>> of retry, right?
>> If so, I think the unlock should be right before "goto try_again".
> No, the mmapped page should be unlocked as long as we cannot return VM_FAULT_LOCKED
> to do_page_mkpage(). Otherwise, the deadlock will happen in do_page_mkpage(). Please
> see the recent 2 mails;-)
> 
> Eric
>>
>> Thanks,
>> Joseph
>>
>> On 2016/9/14 16:04, Eric Ren wrote:
>>> Hi Joseph,
>>>>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>>>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>>>>> will be called if ENOSPC is turned; if we're lucky to get enough clusters,
>>>>>> which is usually the case, we start over again. But in ocfs2_free_write_ctxt()
>>>>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>>>>> the target page again.
>>>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>>>> w_target_locked is set to true, and then will be unlocked by
>>>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>>>> it in more detail?
>>>> Thanks for review;-) Follow up the calling chain:
>>>>
>>>> ocfs2_free_write_ctxt()
>>>>   ->ocfs2_unlock_pages()
>>>>
>>>> in ocfs2_unlock_pages (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>>>> can see the code just put_page(target_page), but not unlock it.
>>> Did this answer your question?
>>>
>>> Thanks,
>>> Eric
>>>> Yeah, I will think this a bit more like:
>>>> why not unlock the target_page there? Is there other potential problems if the "ret" is not "-ENOSPC" but
>>>> other possible error code?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>>>> enough space at the first time.
>>>>>>
>>>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause.
>>>>>>
>>>>>> Signed-off-by: Eric Ren <zren@suse.com>
>>>>>> ---
>>>>>>    fs/ocfs2/aops.c | 7 +++++++
>>>>>>    1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>> index 98d3654..78d1d67 100644
>>>>>> --- a/fs/ocfs2/aops.c
>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>> @@ -1860,6 +1860,13 @@ out:
>>>>>>             */
>>>>>>            try_free = 0;
>>>>>>    +        /*
>>>>>> +         * Unlock mmap_page because the page has been locked when we
>>>>>> +         * are here.
>>>>>> +         */
>>>>>> +        if (mmap_page)
>>>>>> +            unlock_page(mmap_page);
>>>>>> +
>>>>>>            ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>>>            if (ret1 == 1)
>>>>>>                goto try_again;
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-09-14  9:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-10  9:55 [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() Eric Ren
2016-09-12  1:37 ` Joseph Qi
2016-09-12  2:04   ` Eric Ren
2016-09-12  3:06     ` Eric Ren
2016-09-14  8:08       ` Eric Ren
2016-09-14  8:04     ` Eric Ren
2016-09-14  8:25       ` Joseph Qi
2016-09-14  8:43         ` Eric Ren
2016-09-14  9:05           ` Joseph Qi
2016-09-12  2:03 ` Gang He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.