* insanity in ll_dirty_page_discard_warn()
@ 2016-07-28 18:26 Al Viro
2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin
0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2016-07-28 18:26 UTC (permalink / raw)
To: Oleg Drokin; +Cc: linux-fsdevel
/* this can be called inside spin lock so use GFP_ATOMIC. */
buf = (char *)__get_free_page(GFP_ATOMIC);
if (buf) {
dentry = d_find_alias(page->mapping->host);
...
if (dentry)
dput(dentry);
If it *can* be called under a spinlock, you have an obvious problem -
dput() can sleep. d_find_alias() might've picked a hashed dentry with
zero refcount that got unhashed by the time of dput(). Or other references
used to exist, but got dropped by that point...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: insanity in ll_dirty_page_discard_warn()
2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro
@ 2016-07-28 19:25 ` Oleg Drokin
0 siblings, 0 replies; 7+ messages in thread
From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw)
To: Al Viro
Cc: <linux-fsdevel@vger.kernel.org>,
Lustre Development List, Jinshan Xiong
On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
> /* this can be called inside spin lock so use GFP_ATOMIC. */
> buf = (char *)__get_free_page(GFP_ATOMIC);
> if (buf) {
> dentry = d_find_alias(page->mapping->host);
> ...
> if (dentry)
> dput(dentry);
>
> If it *can* be called under a spinlock, you have an obvious problem -
> dput() can sleep. d_find_alias() might've picked a hashed dentry with
> zero refcount that got unhashed by the time of dput(). Or other references
> used to exist, but got dropped by that point...
Ah, the dput()->dentry_kill()->cpu_relax() I guess?
(the final iput cannot catch us here, I think, because we still have pages
in the mapping)
Hm� So the original reported path was:
ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
vvp_page_completion_common at ffffffffa0a7adfc [lustre]
vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
vvp_page_completion_write at ffffffffa0a7b83e [lustre]
cl_page_completion at ffffffffa05eed8f [obdclass]
osc_completion at ffffffffa0880812 [osc]
osc_ap_completion at ffffffffa086a544 [osc]
brw_interpret at ffffffffa0876d69 [osc]
But we don't even have a call to osc_ap_completion from brw_interpret
anymore.
osc_ap_completion() itself has a comment that it is to be called under
cl_loi_list_lock, but then tries to take it itself, so the comment
is definitely stale.
And osc_completion() is called outside of that coverage.
I tend to think the comment is stale now, but need to do some more investigations
before I am 100% sure of that.
Thanks for bringing it to our attention.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn()
@ 2016-07-28 19:25 ` Oleg Drokin
0 siblings, 0 replies; 7+ messages in thread
From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw)
To: Al Viro
Cc: <linux-fsdevel@vger.kernel.org>,
Lustre Development List, Jinshan Xiong
On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
> /* this can be called inside spin lock so use GFP_ATOMIC. */
> buf = (char *)__get_free_page(GFP_ATOMIC);
> if (buf) {
> dentry = d_find_alias(page->mapping->host);
> ...
> if (dentry)
> dput(dentry);
>
> If it *can* be called under a spinlock, you have an obvious problem -
> dput() can sleep. d_find_alias() might've picked a hashed dentry with
> zero refcount that got unhashed by the time of dput(). Or other references
> used to exist, but got dropped by that point...
Ah, the dput()->dentry_kill()->cpu_relax() I guess?
(the final iput cannot catch us here, I think, because we still have pages
in the mapping)
Hm? So the original reported path was:
ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
vvp_page_completion_common at ffffffffa0a7adfc [lustre]
vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
vvp_page_completion_write at ffffffffa0a7b83e [lustre]
cl_page_completion at ffffffffa05eed8f [obdclass]
osc_completion at ffffffffa0880812 [osc]
osc_ap_completion at ffffffffa086a544 [osc]
brw_interpret at ffffffffa0876d69 [osc]
But we don't even have a call to osc_ap_completion from brw_interpret
anymore.
osc_ap_completion() itself has a comment that it is to be called under
cl_loi_list_lock, but then tries to take it itself, so the comment
is definitely stale.
And osc_completion() is called outside of that coverage.
I tend to think the comment is stale now, but need to do some more investigations
before I am 100% sure of that.
Thanks for bringing it to our attention.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [lustre-devel] insanity in ll_dirty_page_discard_warn()
2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin
@ 2016-07-29 17:22 ` Ben Evans
-1 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw)
To: Oleg Drokin, Al Viro
Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List
I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
of the asserts are not true since various structures haven't been
initialized yet, so there is definitely some work to be done in the area.
-Ben
On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin"
<lustre-devel-bounces@lists.lustre.org on behalf of oleg.drokin@intel.com>
wrote:
>
>On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
>
>> /* this can be called inside spin lock so use GFP_ATOMIC. */
>> buf = (char *)__get_free_page(GFP_ATOMIC);
>> if (buf) {
>> dentry = d_find_alias(page->mapping->host);
>> ...
>> if (dentry)
>> dput(dentry);
>>
>> If it *can* be called under a spinlock, you have an obvious problem -
>> dput() can sleep. d_find_alias() might've picked a hashed dentry with
>> zero refcount that got unhashed by the time of dput(). Or other
>>references
>> used to exist, but got dropped by that point...
>
>Ah, the dput()->dentry_kill()->cpu_relax() I guess?
>
>(the final iput cannot catch us here, I think, because we still have pages
>in the mapping)
>
>Hm� So the original reported path was:
> ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
> vvp_page_completion_common at ffffffffa0a7adfc [lustre]
> vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
> vvp_page_completion_write at ffffffffa0a7b83e [lustre]
> cl_page_completion at ffffffffa05eed8f [obdclass]
> osc_completion at ffffffffa0880812 [osc]
> osc_ap_completion at ffffffffa086a544 [osc]
> brw_interpret at ffffffffa0876d69 [osc]
>
>But we don't even have a call to osc_ap_completion from brw_interpret
>anymore.
>
>osc_ap_completion() itself has a comment that it is to be called under
>cl_loi_list_lock, but then tries to take it itself, so the comment
>is definitely stale.
>And osc_completion() is called outside of that coverage.
>
>I tend to think the comment is stale now, but need to do some more
>investigations
>before I am 100% sure of that.
>
>Thanks for bringing it to our attention.
>_______________________________________________
>lustre-devel mailing list
>lustre-devel@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn()
@ 2016-07-29 17:22 ` Ben Evans
0 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw)
To: Oleg Drokin, Al Viro
Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List
I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
of the asserts are not true since various structures haven't been
initialized yet, so there is definitely some work to be done in the area.
-Ben
On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin"
<lustre-devel-bounces at lists.lustre.org on behalf of oleg.drokin@intel.com>
wrote:
>
>On Jul 28, 2016, at 2:26 PM, Al Viro wrote:
>
>> /* this can be called inside spin lock so use GFP_ATOMIC. */
>> buf = (char *)__get_free_page(GFP_ATOMIC);
>> if (buf) {
>> dentry = d_find_alias(page->mapping->host);
>> ...
>> if (dentry)
>> dput(dentry);
>>
>> If it *can* be called under a spinlock, you have an obvious problem -
>> dput() can sleep. d_find_alias() might've picked a hashed dentry with
>> zero refcount that got unhashed by the time of dput(). Or other
>>references
>> used to exist, but got dropped by that point...
>
>Ah, the dput()->dentry_kill()->cpu_relax() I guess?
>
>(the final iput cannot catch us here, I think, because we still have pages
>in the mapping)
>
>Hm? So the original reported path was:
> ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre]
> vvp_page_completion_common at ffffffffa0a7adfc [lustre]
> vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre]
> vvp_page_completion_write at ffffffffa0a7b83e [lustre]
> cl_page_completion at ffffffffa05eed8f [obdclass]
> osc_completion at ffffffffa0880812 [osc]
> osc_ap_completion at ffffffffa086a544 [osc]
> brw_interpret at ffffffffa0876d69 [osc]
>
>But we don't even have a call to osc_ap_completion from brw_interpret
>anymore.
>
>osc_ap_completion() itself has a comment that it is to be called under
>cl_loi_list_lock, but then tries to take it itself, so the comment
>is definitely stale.
>And osc_completion() is called outside of that coverage.
>
>I tend to think the comment is stale now, but need to do some more
>investigations
>before I am 100% sure of that.
>
>Thanks for bringing it to our attention.
>_______________________________________________
>lustre-devel mailing list
>lustre-devel at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn()
2016-07-29 17:22 ` Ben Evans
(?)
@ 2016-08-01 7:54 ` DEGREMONT Aurelien
2016-08-01 13:14 ` Ben Evans
-1 siblings, 1 reply; 7+ messages in thread
From: DEGREMONT Aurelien @ 2016-08-01 7:54 UTC (permalink / raw)
To: lustre-devel
Le 29/07/2016 ? 19:22, Ben Evans a ?crit :
> I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch
> of the asserts are not true since various structures haven't been
> initialized yet, so there is definitely some work to be done in the area.
Ben, did you notice?
https://jira.hpdd.intel.com/browse/LU-8435
Do you have a public tracking of your progress in this area? (jira,
gerrit, ...
Aur?lien
^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn()
2016-08-01 7:54 ` DEGREMONT Aurelien
@ 2016-08-01 13:14 ` Ben Evans
0 siblings, 0 replies; 7+ messages in thread
From: Ben Evans @ 2016-08-01 13:14 UTC (permalink / raw)
To: lustre-devel
No, I'd looked for that failure before, but there was nothing publicly
available.
I finally have a dev setup to reproduce this error, so I'll be working on
the fix this week.
-Ben
On 8/1/16, 3:54 AM, "DEGREMONT Aurelien" <aurelien.degremont@cea.fr> wrote:
>Le 29/07/2016 ? 19:22, Ben Evans a ?crit :
>> I'm also working on a fix in osc_completion where rc=-ENOMEM, and a
>>bunch
>> of the asserts are not true since various structures haven't been
>> initialized yet, so there is definitely some work to be done in the
>>area.
>Ben, did you notice?
>https://jira.hpdd.intel.com/browse/LU-8435
>
>Do you have a public tracking of your progress in this area? (jira,
>gerrit, ...
>
>
>Aur?lien
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-08-01 13:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro
2016-07-28 19:25 ` Oleg Drokin
2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin
2016-07-29 17:22 ` Ben Evans
2016-07-29 17:22 ` Ben Evans
2016-08-01 7:54 ` DEGREMONT Aurelien
2016-08-01 13:14 ` Ben Evans
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.