linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
@ 2013-05-01 13:56 Robert Love
  2013-05-01 15:54 ` David Rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Robert Love @ 2013-05-01 13:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Shankar Brahadeeswaran, Dan Carpenter, LKML,
	Bjorn Bringert, devel, Hugh Dickins, Anjana V Kumar, Dad,
	linux-next

Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
fine, as ashmem cache pruning is advisory anyhow.

Signed-off-by: Robert Love <rlove@google.com>
---
 drivers/staging/android/ashmem.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index e681bdd..82c6768 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -363,7 +363,11 @@ static int ashmem_shrink(struct shrinker *s, struct shrink_control *sc)
 	if (!sc->nr_to_scan)
 		return lru_count;
 
-	mutex_lock(&ashmem_mutex);
+	/* avoid recursing into this code from within ashmem itself */
+	if (!mutex_trylock(&ashmem_mutex)) {
+		return -1;
+	}
+
 	list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
 		loff_t start = range->pgstart * PAGE_SIZE;
 		loff_t end = (range->pgend + 1) * PAGE_SIZE;
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-01 13:56 [PATCH -next] ashmem: Fix ashmem_shrink deadlock Robert Love
@ 2013-05-01 15:54 ` David Rientjes
  2013-05-02 18:22   ` David Rientjes
  2013-05-07 20:52 ` Andrew Morton
  2013-05-13 21:42 ` Greg Kroah-Hartman
  2 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2013-05-01 15:54 UTC (permalink / raw)
  To: Robert Love
  Cc: Greg Kroah-Hartman, Shankar Brahadeeswaran, Dan Carpenter, LKML,
	Bjorn Bringert, devel, Hugh Dickins, Anjana V Kumar,
	Andrew Morton, linux-next

On Wed, 1 May 2013, Robert Love wrote:

> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
> shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
> fine, as ashmem cache pruning is advisory anyhow.
> 
> Signed-off-by: Robert Love <rlove@google.com>

Any reason not to send this to stable@vger.kernel.org if it fixes an 
observable deadlock?  (It's annotated to be applied to linux-next, but I 
don't see any differences between it and Linus's tree.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-01 15:54 ` David Rientjes
@ 2013-05-02 18:22   ` David Rientjes
  2013-05-02 20:39     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2013-05-02 18:22 UTC (permalink / raw)
  To: Robert Love, Greg Kroah-Hartman
  Cc: Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, Andrew Morton, linux-next

On Wed, 1 May 2013, David Rientjes wrote:

> > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
> > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
> > fine, as ashmem cache pruning is advisory anyhow.
> > 
> > Signed-off-by: Robert Love <rlove@google.com>
> 
> Any reason not to send this to stable@vger.kernel.org if it fixes an 
> observable deadlock?  (It's annotated to be applied to linux-next, but I 
> don't see any differences between it and Linus's tree.)
> 

This was sent separately to stable@vger.kernel.org before being merged 
into Linus's tree .  Greg, could this be queued up for 3.10 with a cc to 
stable@vger.kernel.org?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-02 18:22   ` David Rientjes
@ 2013-05-02 20:39     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2013-05-02 20:39 UTC (permalink / raw)
  To: David Rientjes
  Cc: Robert Love, devel, Shankar Brahadeeswaran, Bjorn Bringert,
	Hugh Dickins, LKML, linux-next, Anjana V Kumar, Andrew Morton,
	Dan Carpenter

On Thu, May 02, 2013 at 11:22:18AM -0700, David Rientjes wrote:
> On Wed, 1 May 2013, David Rientjes wrote:
> 
> > > Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
> > > shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
> > > fine, as ashmem cache pruning is advisory anyhow.
> > > 
> > > Signed-off-by: Robert Love <rlove@google.com>
> > 
> > Any reason not to send this to stable@vger.kernel.org if it fixes an 
> > observable deadlock?  (It's annotated to be applied to linux-next, but I 
> > don't see any differences between it and Linus's tree.)
> > 
> 
> This was sent separately to stable@vger.kernel.org before being merged 
> into Linus's tree .  Greg, could this be queued up for 3.10 with a cc to 
> stable@vger.kernel.org?

Yes, I'll handle all of this properly, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-01 13:56 [PATCH -next] ashmem: Fix ashmem_shrink deadlock Robert Love
  2013-05-01 15:54 ` David Rientjes
@ 2013-05-07 20:52 ` Andrew Morton
  2013-05-13 21:42 ` Greg Kroah-Hartman
  2 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2013-05-07 20:52 UTC (permalink / raw)
  To: Robert Love
  Cc: Greg Kroah-Hartman, Shankar Brahadeeswaran, Dan Carpenter, LKML,
	Bjorn Bringert, devel, Hugh Dickins, Anjana V Kumar, linux-next

On Wed,  1 May 2013 09:56:13 -0400 Robert Love <rlove@google.com> wrote:

> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
> shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
> fine, as ashmem cache pruning is advisory anyhow.
> 

Sorry, but I don't think "somehow" is an adequate description of a
kernel bug.  The deadlock should be described with specificity, so that
others can understand and review the fix and perhaps suggest
alternative implementations.

Presumably someone is performing a memory allocation while holding
ashmem_mutex.  A more idiomatic way of avoiding a call to direct
reclaim in these circumstances would be for the task to set its
PF_MEMALLOC flag, or to use GFP_ATOMIC.  But without any details that's
as far as I can go.

> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -363,7 +363,11 @@ static int ashmem_shrink(struct shrinker *s, struct shrink_control *sc)
>  	if (!sc->nr_to_scan)
>  		return lru_count;
>  
> -	mutex_lock(&ashmem_mutex);
> +	/* avoid recursing into this code from within ashmem itself */
> +	if (!mutex_trylock(&ashmem_mutex)) {
> +		return -1;
> +	}

This is rather hacky.  It consumes more CPU than the above approaches,
and more stack.

Worst of all, it obviously hasn't met checkpatch.pl ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-01 13:56 [PATCH -next] ashmem: Fix ashmem_shrink deadlock Robert Love
  2013-05-01 15:54 ` David Rientjes
  2013-05-07 20:52 ` Andrew Morton
@ 2013-05-13 21:42 ` Greg Kroah-Hartman
  2013-05-14  3:29   ` Neil Zhang
  2 siblings, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2013-05-13 21:42 UTC (permalink / raw)
  To: Robert Love
  Cc: Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, Dad, linux-next

On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote:
> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed into the
> shrinker code from within ashmem. Just bail out, avoiding a deadlock. This is
> fine, as ashmem cache pruning is advisory anyhow.
> 
> Signed-off-by: Robert Love <rlove@google.com>
> ---
>  drivers/staging/android/ashmem.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Based on Andrew's review comments, I'll drop this from my queue and wait
for a "better" fix for this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-13 21:42 ` Greg Kroah-Hartman
@ 2013-05-14  3:29   ` Neil Zhang
  2013-05-14  3:37     ` Raul Xiong
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Zhang @ 2013-05-14  3:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: raulxiong, Robert Love, Shankar Brahadeeswaran, Dan Carpenter,
	LKML, Bjorn Bringert, devel, Hugh Dickins, Anjana V Kumar, Dad,
	linux-next

2013/5/14 Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
> On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote:
>> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed
>> into the
>> shrinker code from within ashmem. Just bail out, avoiding a deadlock.
>> This is
>> fine, as ashmem cache pruning is advisory anyhow.
>>
>> Signed-off-by: Robert Love <rlove@google.com>
>> ---
>>  drivers/staging/android/ashmem.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Based on Andrew's review comments, I'll drop this from my queue and wait
> for a "better" fix for this.
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

We found the similar issue these days.
Add RaulXiong to paste the call stack.

Best Regards,
Neil Zhang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-14  3:29   ` Neil Zhang
@ 2013-05-14  3:37     ` Raul Xiong
  2013-05-16  8:15       ` Raul Xiong
  0 siblings, 1 reply; 15+ messages in thread
From: Raul Xiong @ 2013-05-14  3:37 UTC (permalink / raw)
  To: Neil Zhang
  Cc: Greg Kroah-Hartman, Robert Love, Shankar Brahadeeswaran,
	Dan Carpenter, LKML, Bjorn Bringert, devel, Hugh Dickins,
	Anjana V Kumar, Dad, linux-next

2013/5/14 Neil Zhang <glacier1980@gmail.com>:
> 2013/5/14 Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
>> On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote:
>>> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed
>>> into the
>>> shrinker code from within ashmem. Just bail out, avoiding a deadlock.
>>> This is
>>> fine, as ashmem cache pruning is advisory anyhow.
>>>
>>> Signed-off-by: Robert Love <rlove@google.com>
>>> ---
>>>  drivers/staging/android/ashmem.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> Based on Andrew's review comments, I'll drop this from my queue and wait
>> for a "better" fix for this.
>>
>> thanks,
>>
>> greg k-h
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> We found the similar issue these days.
> Add RaulXiong to paste the call stack.
>
> Best Regards,
> Neil Zhang

Hi all,
I just encountered this deadlock during some stress test and it can be
described clearly by below function stack. Please check and suggest a
formal fix for it.

[<c05d3370>] (__schedule) from [<c05d3818>]
[<c05d3818>] (schedule_preempt_disabled) from [<c05d2578>]
[<c05d2578>] (__mutex_lock_slowpath) from [<c05d263c>]
[<c05d263c>] (mutex_lock) from [<c0441dd8>]
[<c0441dd8>] (ashmem_shrink) from [<c01ae00c>]
[<c01ae00c>] (shrink_slab) from [<c01b0ec8>]
[<c01b0ec8>] (try_to_free_pages) from [<c01a65ec>]
[<c01a65ec>] (__alloc_pages_nodemask) from [<c01d0414>]
[<c01d0414>] (new_slab) from [<c05cf3a0>]
[<c05cf3a0>] (__slab_alloc.isra.46.constprop.52) from [<c01d08cc>]
[<c01d08cc>] (kmem_cache_alloc) from [<c01b1f6c>]
[<c01b1f6c>] (shmem_alloc_inode) from [<c01e8d18>]
[<c01e8d18>] (alloc_inode) from [<c01ea3c4>]
[<c01ea3c4>] (new_inode_pseudo) from [<c01ea404>]
[<c01ea404>] (new_inode) from [<c01b157c>]
[<c01b157c>] (shmem_get_inode) from [<c01b3eac>]
[<c01b3eac>] (shmem_file_setup) from [<c0441d1c>]
[<c0441d1c>] (ashmem_mmap) from [<c01c1908>]
[<c01c1908>] (mmap_region) from [<c01c1eac>]
[<c01c1eac>] (sys_mmap_pgoff) from [<c0112d80>]

Thanks,
Raul Xiong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-14  3:37     ` Raul Xiong
@ 2013-05-16  8:15       ` Raul Xiong
  2013-05-16 13:44         ` Robert Love
  0 siblings, 1 reply; 15+ messages in thread
From: Raul Xiong @ 2013-05-16  8:15 UTC (permalink / raw)
  To: Neil Zhang
  Cc: Greg Kroah-Hartman, Robert Love, Shankar Brahadeeswaran,
	Dan Carpenter, LKML, Bjorn Bringert, devel, Hugh Dickins,
	Anjana V Kumar, Dad, linux-next

2013/5/14 Raul Xiong <raulxiong@gmail.com>
>
> 2013/5/14 Neil Zhang <glacier1980@gmail.com>:
> > 2013/5/14 Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
> >> On Wed, May 01, 2013 at 09:56:13AM -0400, Robert Love wrote:
> >>> Don't acquire ashmem_mutex in ashmem_shrink if we've somehow recursed
> >>> into the
> >>> shrinker code from within ashmem. Just bail out, avoiding a deadlock.
> >>> This is
> >>> fine, as ashmem cache pruning is advisory anyhow.
> >>>
> >>> Signed-off-by: Robert Love <rlove@google.com>
> >>> ---
> >>>  drivers/staging/android/ashmem.c | 6 +++++-
> >>>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> Based on Andrew's review comments, I'll drop this from my queue and wait
> >> for a "better" fix for this.
> >>
> >> thanks,
> >>
> >> greg k-h
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >
> > We found the similar issue these days.
> > Add RaulXiong to paste the call stack.
> >
> > Best Regards,
> > Neil Zhang
>
> Hi all,
> I just encountered this deadlock during some stress test and it can be
> described clearly by below function stack. Please check and suggest a
> formal fix for it.
>
> [<c05d3370>] (__schedule) from [<c05d3818>]
> [<c05d3818>] (schedule_preempt_disabled) from [<c05d2578>]
> [<c05d2578>] (__mutex_lock_slowpath) from [<c05d263c>]
> [<c05d263c>] (mutex_lock) from [<c0441dd8>]
> [<c0441dd8>] (ashmem_shrink) from [<c01ae00c>]
> [<c01ae00c>] (shrink_slab) from [<c01b0ec8>]
> [<c01b0ec8>] (try_to_free_pages) from [<c01a65ec>]
> [<c01a65ec>] (__alloc_pages_nodemask) from [<c01d0414>]
> [<c01d0414>] (new_slab) from [<c05cf3a0>]
> [<c05cf3a0>] (__slab_alloc.isra.46.constprop.52) from [<c01d08cc>]
> [<c01d08cc>] (kmem_cache_alloc) from [<c01b1f6c>]
> [<c01b1f6c>] (shmem_alloc_inode) from [<c01e8d18>]
> [<c01e8d18>] (alloc_inode) from [<c01ea3c4>]
> [<c01ea3c4>] (new_inode_pseudo) from [<c01ea404>]
> [<c01ea404>] (new_inode) from [<c01b157c>]
> [<c01b157c>] (shmem_get_inode) from [<c01b3eac>]
> [<c01b3eac>] (shmem_file_setup) from [<c0441d1c>]
> [<c0441d1c>] (ashmem_mmap) from [<c01c1908>]
> [<c01c1908>] (mmap_region) from [<c01c1eac>]
> [<c01c1eac>] (sys_mmap_pgoff) from [<c0112d80>]
>
> Thanks,
> Raul Xiong



Hi Andrew, Greg,

Any feedback?

The issue happens in such sequence:
ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup
called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink
tries to acquire the same ashmem_mutex -- it blocks here.

I think this reports the bug clearly. Please have a look.


Thanks,
Raul Xiong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16  8:15       ` Raul Xiong
@ 2013-05-16 13:44         ` Robert Love
  2013-05-16 16:45           ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Love @ 2013-05-16 13:44 UTC (permalink / raw)
  To: Raul Xiong
  Cc: Neil Zhang, Greg Kroah-Hartman, Shankar Brahadeeswaran,
	Dan Carpenter, LKML, Bjorn Bringert, devel, Hugh Dickins,
	Anjana V Kumar, Dad, linux-next

On Thu, May 16, 2013 at 4:15 AM, Raul Xiong <raulxiong@gmail.com> wrote:
> The issue happens in such sequence:
> ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup
> called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink
> tries to acquire the same ashmem_mutex -- it blocks here.
>
> I think this reports the bug clearly. Please have a look.

There is no debate about the nature of the bug. Only the fix.

My mutex_trylock patch fixes the problem. I prefer that solution.

Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate
that down into shmem and elsewhere.

Using PF_MEMALLOC will work. You'd want to define something like:

static int set_memalloc(void)
{
        if (current->flags & PF_MEMALLOC)
                return 0;
        current->flags |= PF_MEMALLOC;
        return 1;
}

static void clear_memalloc(int memalloc)
{
        if (memalloc)
                current->flags &= ~PF_MEMALLOC;
}

and then set/clear PF_MEMALLOC around every memory allocation and
function that descends into a memory allocation. As said I prefer my
solution but if someone wants to put together a patch with this
approach, fine by me.

         Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16 13:44         ` Robert Love
@ 2013-05-16 16:45           ` Andrew Morton
  2013-05-16 17:08             ` Robert Love
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2013-05-16 16:45 UTC (permalink / raw)
  To: Robert Love
  Cc: Raul Xiong, Neil Zhang, Greg Kroah-Hartman,
	Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, linux-next

On Thu, 16 May 2013 09:44:49 -0400 Robert Love <rlove@google.com> wrote:

> On Thu, May 16, 2013 at 4:15 AM, Raul Xiong <raulxiong@gmail.com> wrote:
> > The issue happens in such sequence:
> > ashmem_mmap acquired ashmem_mutex --> ashmem_mutex:shmem_file_setup
> > called kmem_cache_alloc --> shrink due to low memory --> ashmem_shrink
> > tries to acquire the same ashmem_mutex -- it blocks here.
> >
> > I think this reports the bug clearly. Please have a look.
> 
> There is no debate about the nature of the bug. Only the fix.
> 
> My mutex_trylock patch fixes the problem. I prefer that solution.
> 
> Andrew's suggestion of GFP_ATOMIC won't work as we'd have to propagate
> that down into shmem and elsewhere.

s/won't work/impractical/

A better approach would be to add a new __GFP_NOSHRINKERS, but it's all
variations on a theme.

> Using PF_MEMALLOC will work. You'd want to define something like:
> 
> static int set_memalloc(void)
> {
>         if (current->flags & PF_MEMALLOC)
>                 return 0;
>         current->flags |= PF_MEMALLOC;
>         return 1;
> }
> 
> static void clear_memalloc(int memalloc)
> {
>         if (memalloc)
>                 current->flags &= ~PF_MEMALLOC;
> }
> 
> and then set/clear PF_MEMALLOC around every memory allocation and
> function that descends into a memory allocation. As said I prefer my
> solution but if someone wants to put together a patch with this
> approach, fine by me.

The mutex_trylock(ashmem_mutex) will actually have the best
performance, because it skips the least amount of memory reclaim
opportunities.

But it still sucks!  The real problem is that there exists a lock
called "ashmem_mutex", taken by both the high-level mmap() and by the
low-level shrinker.  And taken by everything else too!  The ashmem
locking is pretty crude...

What is the mutex_lock() in ashmem_mmap() actually protecting?  I don't
see much, apart from perhaps some incidental races around the contents
of the file's ashmem_area, and those could/should be protected by a
per-object lock, not a global one?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16 16:45           ` Andrew Morton
@ 2013-05-16 17:08             ` Robert Love
  2013-05-16 17:19               ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Love @ 2013-05-16 17:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Raul Xiong, Neil Zhang, Greg Kroah-Hartman,
	Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, linux-next

On Thu, May 16, 2013 at 12:45 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> A better approach would be to add a new __GFP_NOSHRINKERS, but it's all
> variations on a theme.

I don't like this proposal, either. Many of the existing GFP flags
already exist to prevent recurse into that flag's respective shrinker.

This problem seems a rare proper use of mutex_trylock.

> The mutex_trylock(ashmem_mutex) will actually have the best
> performance, because it skips the least amount of memory reclaim
> opportunities.

Right.

> But it still sucks!  The real problem is that there exists a lock
> called "ashmem_mutex", taken by both the high-level mmap() and by the
> low-level shrinker.  And taken by everything else too!  The ashmem
> locking is pretty crude...

The locking is "crude" because I optimized for space, not time, and
there was (and is) no indication we were suffering lock contention due
to the global lock. I haven't thought through the implications of
pushing locking into the ashmem_area and ashmem_range objects, but it
does look like we'd end up often grabbing all of the locks ...

> What is the mutex_lock() in ashmem_mmap() actually protecting?  I don't
> see much, apart from perhaps some incidental races around the contents
> of the file's ashmem_area, and those could/should be protected by a
> per-object lock, not a global one?

... but not, as you note, in ashmem_mmap. The main race there is
around the allocation of asma->file. That could definitely be a lock
local to ashmem_area. I'm OK if anyone wants to take that on but it
seems a lot of work for a driver with an unclear future.

        Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16 17:08             ` Robert Love
@ 2013-05-16 17:19               ` Andrew Morton
  2013-05-16 17:28                 ` Robert Love
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2013-05-16 17:19 UTC (permalink / raw)
  To: Robert Love
  Cc: Raul Xiong, Neil Zhang, Greg Kroah-Hartman,
	Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, linux-next

On Thu, 16 May 2013 13:08:17 -0400 Robert Love <rlove@google.com> wrote:

> On Thu, May 16, 2013 at 12:45 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > A better approach would be to add a new __GFP_NOSHRINKERS, but it's all
> > variations on a theme.
> 
> I don't like this proposal, either. Many of the existing GFP flags
> already exist to prevent recurse into that flag's respective shrinker.
> 
> This problem seems a rare proper use of mutex_trylock.

Not really.  The need for a trylock is often an indication that a
subsystem has a locking misdesign.  That is indeed the case here.

> > The mutex_trylock(ashmem_mutex) will actually have the best
> > performance, because it skips the least amount of memory reclaim
> > opportunities.
> 
> Right.
> 
> > But it still sucks!  The real problem is that there exists a lock
> > called "ashmem_mutex", taken by both the high-level mmap() and by the
> > low-level shrinker.  And taken by everything else too!  The ashmem
> > locking is pretty crude...
> 
> The locking is "crude" because I optimized for space, not time, and
> there was (and is) no indication we were suffering lock contention due
> to the global lock. I haven't thought through the implications of
> pushing locking into the ashmem_area and ashmem_range objects, but it
> does look like we'd end up often grabbing all of the locks ...
> 
> > What is the mutex_lock() in ashmem_mmap() actually protecting?  I don't
> > see much, apart from perhaps some incidental races around the contents
> > of the file's ashmem_area, and those could/should be protected by a
> > per-object lock, not a global one?
> 
> ... but not, as you note, in ashmem_mmap. The main race there is
> around the allocation of asma->file. That could definitely be a lock
> local to ashmem_area. I'm OK if anyone wants to take that on but it
> seems a lot of work for a driver with an unclear future.

Well, it's not exactly a ton of work, but adding a per-ashmem_area lock
to protect ->file would rather be putting lipstick on a pig.  I suppose
we can put the trylock in there and run away, but it wouldn't hurt to
drop in a big fat comment somewhere explaining that the driver should be
migrated to a per-object locking scheme.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16 17:19               ` Andrew Morton
@ 2013-05-16 17:28                 ` Robert Love
  2013-09-17  5:05                   ` Raul Xiong
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Love @ 2013-05-16 17:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Raul Xiong, Neil Zhang, Greg Kroah-Hartman,
	Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, linux-next

On Thu, May 16, 2013 at 1:19 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 16 May 2013 13:08:17 -0400 Robert Love <rlove@google.com> wrote:
>> This problem seems a rare proper use of mutex_trylock.
>
> Not really.  The need for a trylock is often an indication that a
> subsystem has a locking misdesign.  That is indeed the case here.

It is exactly the same as PF_MEMALLOC. We've got an effectively
asynchronous event (shrinking) that can occur while you are holding
locks requisite to that shrinking. Given that the shrinkage is best
effort, a trylock actually communicates the intent pretty well: "If
possible, grab this lock and shrink."

I think the idiomatic fix is to introduce a GFP_SHMEM but that seems
overkill. Lots of the GFP flags are really just preventing recursing
into the shrinkage code and it seems ill-designed that we require
developers to know where they might end up. But we can disagree. :)

> Well, it's not exactly a ton of work, but adding a per-ashmem_area lock
> to protect ->file would rather be putting lipstick on a pig.  I suppose
> we can put the trylock in there and run away, but it wouldn't hurt to
> drop in a big fat comment somewhere explaining that the driver should be
> migrated to a per-object locking scheme.

Unfortunately I think ashmem_shrink would need to grab the per-object
lock too; it needs to update the ranges. I'm sure we could re-design
this but I don't think it is as easy as simply pushing the locking
into the objects.

       Robert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -next] ashmem: Fix ashmem_shrink deadlock.
  2013-05-16 17:28                 ` Robert Love
@ 2013-09-17  5:05                   ` Raul Xiong
  0 siblings, 0 replies; 15+ messages in thread
From: Raul Xiong @ 2013-09-17  5:05 UTC (permalink / raw)
  To: Robert Love
  Cc: Andrew Morton, Neil Zhang, Greg Kroah-Hartman,
	Shankar Brahadeeswaran, Dan Carpenter, LKML, Bjorn Bringert,
	devel, Hugh Dickins, Anjana V Kumar, linux-next

2013/5/17 Robert Love <rlove@google.com>:
> On Thu, May 16, 2013 at 1:19 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Thu, 16 May 2013 13:08:17 -0400 Robert Love <rlove@google.com> wrote:
>>> This problem seems a rare proper use of mutex_trylock.
>>
>> Not really.  The need for a trylock is often an indication that a
>> subsystem has a locking misdesign.  That is indeed the case here.
>
> It is exactly the same as PF_MEMALLOC. We've got an effectively
> asynchronous event (shrinking) that can occur while you are holding
> locks requisite to that shrinking. Given that the shrinkage is best
> effort, a trylock actually communicates the intent pretty well: "If
> possible, grab this lock and shrink."
>
> I think the idiomatic fix is to introduce a GFP_SHMEM but that seems
> overkill. Lots of the GFP flags are really just preventing recursing
> into the shrinkage code and it seems ill-designed that we require
> developers to know where they might end up. But we can disagree. :)
>
>> Well, it's not exactly a ton of work, but adding a per-ashmem_area lock
>> to protect ->file would rather be putting lipstick on a pig.  I suppose
>> we can put the trylock in there and run away, but it wouldn't hurt to
>> drop in a big fat comment somewhere explaining that the driver should be
>> migrated to a per-object locking scheme.
>
> Unfortunately I think ashmem_shrink would need to grab the per-object
> lock too; it needs to update the ranges. I'm sure we could re-design
> this but I don't think it is as easy as simply pushing the locking
> into the objects.
>
>        Robert

Hi all,
I am wondering if this is fixed in latest kernel? We are continuously
seeing this deadlock issue.

Best Regards,
Raul

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-09-17  5:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-01 13:56 [PATCH -next] ashmem: Fix ashmem_shrink deadlock Robert Love
2013-05-01 15:54 ` David Rientjes
2013-05-02 18:22   ` David Rientjes
2013-05-02 20:39     ` Greg Kroah-Hartman
2013-05-07 20:52 ` Andrew Morton
2013-05-13 21:42 ` Greg Kroah-Hartman
2013-05-14  3:29   ` Neil Zhang
2013-05-14  3:37     ` Raul Xiong
2013-05-16  8:15       ` Raul Xiong
2013-05-16 13:44         ` Robert Love
2013-05-16 16:45           ` Andrew Morton
2013-05-16 17:08             ` Robert Love
2013-05-16 17:19               ` Andrew Morton
2013-05-16 17:28                 ` Robert Love
2013-09-17  5:05                   ` Raul Xiong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).