All of lore.kernel.org
 help / color / mirror / Atom feed
* insanity in ll_dirty_page_discard_warn()
@ 2016-07-28 18:26 Al Viro
  2016-07-28 19:25   ` [lustre-devel] " Oleg Drokin
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2016-07-28 18:26 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-fsdevel

        /* this can be called inside spin lock so use GFP_ATOMIC. */
        buf = (char *)__get_free_page(GFP_ATOMIC);
        if (buf) {
                dentry = d_find_alias(page->mapping->host);
	...
	if (dentry)
		dput(dentry);

If it *can* be called under a spinlock, you have an obvious problem -
dput() can sleep.  d_find_alias() might've picked a hashed dentry with
zero refcount that got unhashed by the time of dput().  Or other references
used to exist, but got dropped by that point...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: insanity in ll_dirty_page_discard_warn()
  2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro
@ 2016-07-28 19:25   ` Oleg Drokin
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw)
  To: Al Viro
  Cc: <linux-fsdevel@vger.kernel.org>,
	Lustre Development List, Jinshan Xiong


On Jul 28, 2016, at 2:26 PM, Al Viro wrote:

>        /* this can be called inside spin lock so use GFP_ATOMIC. */
>        buf = (char *)__get_free_page(GFP_ATOMIC);
>        if (buf) {
>                dentry = d_find_alias(page->mapping->host);
> 	...
> 	if (dentry)
> 		dput(dentry);
> 
> If it *can* be called under a spinlock, you have an obvious problem -
> dput() can sleep.  d_find_alias() might've picked a hashed dentry with
> zero refcount that got unhashed by the time of dput().  Or other references
> used to exist, but got dropped by that point...

Ah, the dput()->dentry_kill()->cpu_relax() I guess?

(the final iput cannot catch us here, I think, because we still have pages
in the mapping)

Hm� So the original reported path was:
            ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
            vvp_page_completion_common at ffffffffa0a7adfc [lustre]
            vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
            vvp_page_completion_write at ffffffffa0a7b83e [lustre]
            cl_page_completion at ffffffffa05eed8f [obdclass]
            osc_completion at ffffffffa0880812 [osc]
            osc_ap_completion at ffffffffa086a544 [osc]
            brw_interpret at ffffffffa0876d69 [osc]

But we don't even have a call to osc_ap_completion from brw_interpret
anymore.

osc_ap_completion() itself has a comment that it is to be called under
cl_loi_list_lock, but then tries to take it itself, so the comment
is definitely stale.
And osc_completion() is called outside of that coverage.

I tend to think the comment is stale now, but need to do some more investigations
before I am 100% sure of that.

Thanks for bringing it to our attention.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] insanity in ll_dirty_page_discard_warn()
@ 2016-07-28 19:25   ` Oleg Drokin
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw)
  To: Al Viro
  Cc: <linux-fsdevel@vger.kernel.org>,
	Lustre Development List, Jinshan Xiong


On Jul 28, 2016, at 2:26 PM, Al Viro wrote:

>        /* this can be called inside spin lock so use GFP_ATOMIC. */
>        buf = (char *)__get_free_page(GFP_ATOMIC);
>        if (buf) {
>                dentry = d_find_alias(page->mapping->host);
> 	...
> 	if (dentry)
> 		dput(dentry);
> 
> If it *can* be called under a spinlock, you have an obvious problem -
> dput() can sleep.  d_find_alias() might've picked a hashed dentry with
> zero refcount that got unhashed by the time of dput().  Or other references
> used to exist, but got dropped by that point...

Ah, the dput()->dentry_kill()->cpu_relax() I guess?

(the final iput cannot catch us here, I think, because we still have pages
in the mapping)

Hm? So the original reported path was:
            ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
            vvp_page_completion_common at ffffffffa0a7adfc [lustre]
            vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
            vvp_page_completion_write at ffffffffa0a7b83e [lustre]
            cl_page_completion at ffffffffa05eed8f [obdclass]
            osc_completion at ffffffffa0880812 [osc]
            osc_ap_completion at ffffffffa086a544 [osc]
            brw_interpret at ffffffffa0876d69 [osc]

But we don't even have a call to osc_ap_completion from brw_interpret
anymore.

osc_ap_completion() itself has a comment that it is to be called under
cl_loi_list_lock, but then tries to take it itself, so the comment
is definitely stale.
And osc_completion() is called outside of that coverage.

I tend to think the comment is stale now, but need to do some more investigations
before I am 100% sure of that.

Thanks for bringing it to our attention.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [lustre-devel] insanity in ll_dirty_page_discard_warn()
  2016-07-28 19:25   ` [lustre-devel] " Oleg Drokin
@ 2016-07-29 17:22     ` Ben Evans
  -1 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw)
  To: Oleg Drokin, Al Viro
  Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List

I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
of the asserts are not true since various structures haven't been
initialized yet, so there is definitely some work to be done in the area.

-Ben

On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin"
<lustre-devel-bounces@lists.lustre.org on behalf of oleg.drokin@intel.com>
wrote:

>
>On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
>
>>        /* this can be called inside spin lock so use GFP_ATOMIC. */
>>        buf = (char *)__get_free_page(GFP_ATOMIC);
>>        if (buf) {
>>                dentry = d_find_alias(page->mapping->host);
>> 	...
>> 	if (dentry)
>> 		dput(dentry);
>> 
>> If it *can* be called under a spinlock, you have an obvious problem -
>> dput() can sleep.  d_find_alias() might've picked a hashed dentry with
>> zero refcount that got unhashed by the time of dput().  Or other
>>references
>> used to exist, but got dropped by that point...
>
>Ah, the dput()->dentry_kill()->cpu_relax() I guess?
>
>(the final iput cannot catch us here, I think, because we still have pages
>in the mapping)
>
>Hm� So the original reported path was:
>            ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
>            vvp_page_completion_common at ffffffffa0a7adfc [lustre]
>            vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
>            vvp_page_completion_write at ffffffffa0a7b83e [lustre]
>            cl_page_completion at ffffffffa05eed8f [obdclass]
>            osc_completion at ffffffffa0880812 [osc]
>            osc_ap_completion at ffffffffa086a544 [osc]
>            brw_interpret at ffffffffa0876d69 [osc]
>
>But we don't even have a call to osc_ap_completion from brw_interpret
>anymore.
>
>osc_ap_completion() itself has a comment that it is to be called under
>cl_loi_list_lock, but then tries to take it itself, so the comment
>is definitely stale.
>And osc_completion() is called outside of that coverage.
>
>I tend to think the comment is stale now, but need to do some more
>investigations
>before I am 100% sure of that.
>
>Thanks for bringing it to our attention.
>_______________________________________________
>lustre-devel mailing list
>lustre-devel@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] insanity in ll_dirty_page_discard_warn()
@ 2016-07-29 17:22     ` Ben Evans
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw)
  To: Oleg Drokin, Al Viro
  Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List

I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
of the asserts are not true since various structures haven't been
initialized yet, so there is definitely some work to be done in the area.

-Ben

On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin"
<lustre-devel-bounces at lists.lustre.org on behalf of oleg.drokin@intel.com>
wrote:

>
>On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
>
>>        /* this can be called inside spin lock so use GFP_ATOMIC. */
>>        buf = (char *)__get_free_page(GFP_ATOMIC);
>>        if (buf) {
>>                dentry = d_find_alias(page->mapping->host);
>> 	...
>> 	if (dentry)
>> 		dput(dentry);
>> 
>> If it *can* be called under a spinlock, you have an obvious problem -
>> dput() can sleep.  d_find_alias() might've picked a hashed dentry with
>> zero refcount that got unhashed by the time of dput().  Or other
>>references
>> used to exist, but got dropped by that point...
>
>Ah, the dput()->dentry_kill()->cpu_relax() I guess?
>
>(the final iput cannot catch us here, I think, because we still have pages
>in the mapping)
>
>Hm? So the original reported path was:
>            ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
>            vvp_page_completion_common at ffffffffa0a7adfc [lustre]
>            vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
>            vvp_page_completion_write at ffffffffa0a7b83e [lustre]
>            cl_page_completion at ffffffffa05eed8f [obdclass]
>            osc_completion at ffffffffa0880812 [osc]
>            osc_ap_completion at ffffffffa086a544 [osc]
>            brw_interpret at ffffffffa0876d69 [osc]
>
>But we don't even have a call to osc_ap_completion from brw_interpret
>anymore.
>
>osc_ap_completion() itself has a comment that it is to be called under
>cl_loi_list_lock, but then tries to take it itself, so the comment
>is definitely stale.
>And osc_completion() is called outside of that coverage.
>
>I tend to think the comment is stale now, but need to do some more
>investigations
>before I am 100% sure of that.
>
>Thanks for bringing it to our attention.
>_______________________________________________
>lustre-devel mailing list
>lustre-devel at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] insanity in ll_dirty_page_discard_warn()
  2016-07-29 17:22     ` Ben Evans
  (?)
@ 2016-08-01  7:54     ` DEGREMONT Aurelien
  2016-08-01 13:14       ` Ben Evans
  -1 siblings, 1 reply; 7+ messages in thread
From: DEGREMONT Aurelien @ 2016-08-01  7:54 UTC (permalink / raw)
  To: lustre-devel

Le 29/07/2016 ? 19:22, Ben Evans a ?crit :
> I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
> of the asserts are not true since various structures haven't been
> initialized yet, so there is definitely some work to be done in the area.
Ben, did you notice?
https://jira.hpdd.intel.com/browse/LU-8435

Do you have a public tracking of your progress in this area? (jira, 
gerrit, ...


Aur?lien

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] insanity in ll_dirty_page_discard_warn()
  2016-08-01  7:54     ` DEGREMONT Aurelien
@ 2016-08-01 13:14       ` Ben Evans
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-08-01 13:14 UTC (permalink / raw)
  To: lustre-devel

No, I'd looked for that failure before, but there was nothing publicly
available.

I finally have a dev setup to reproduce this error, so I'll be working on
the fix this week.

-Ben

On 8/1/16, 3:54 AM, "DEGREMONT Aurelien" <aurelien.degremont@cea.fr> wrote:

>Le 29/07/2016 ? 19:22, Ben Evans a ?crit :
>> I'm also working on a fix in osc_completion where rc=-ENOMEM, and a
>>bunch
>> of the asserts are not true since various structures haven't been
>> initialized yet, so there is definitely some work to be done in the
>>area.
>Ben, did you notice?
>https://jira.hpdd.intel.com/browse/LU-8435
>
>Do you have a public tracking of your progress in this area? (jira,
>gerrit, ...
>
>
>Aur?lien

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-01 13:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro
2016-07-28 19:25 ` Oleg Drokin
2016-07-28 19:25   ` [lustre-devel] " Oleg Drokin
2016-07-29 17:22   ` Ben Evans
2016-07-29 17:22     ` Ben Evans
2016-08-01  7:54     ` DEGREMONT Aurelien
2016-08-01 13:14       ` Ben Evans

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.