* insanity in ll_dirty_page_discard_warn() @ 2016-07-28 18:26 Al Viro 2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2016-07-28 18:26 UTC (permalink / raw) To: Oleg Drokin; +Cc: linux-fsdevel /* this can be called inside spin lock so use GFP_ATOMIC. */ buf = (char *)__get_free_page(GFP_ATOMIC); if (buf) { dentry = d_find_alias(page->mapping->host); ... if (dentry) dput(dentry); If it *can* be called under a spinlock, you have an obvious problem - dput() can sleep. d_find_alias() might've picked a hashed dentry with zero refcount that got unhashed by the time of dput(). Or other references used to exist, but got dropped by that point... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: insanity in ll_dirty_page_discard_warn() 2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro @ 2016-07-28 19:25 ` Oleg Drokin 0 siblings, 0 replies; 7+ messages in thread From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw) To: Al Viro Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List, Jinshan Xiong On Jul 28, 2016, at 2:26 PM, Al Viro wrote: > /* this can be called inside spin lock so use GFP_ATOMIC. */ > buf = (char *)__get_free_page(GFP_ATOMIC); > if (buf) { > dentry = d_find_alias(page->mapping->host); > ... > if (dentry) > dput(dentry); > > If it *can* be called under a spinlock, you have an obvious problem - > dput() can sleep. d_find_alias() might've picked a hashed dentry with > zero refcount that got unhashed by the time of dput(). Or other references > used to exist, but got dropped by that point... Ah, the dput()->dentry_kill()->cpu_relax() I guess? (the final iput cannot catch us here, I think, because we still have pages in the mapping) Hm� So the original reported path was: ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre] vvp_page_completion_common at ffffffffa0a7adfc [lustre] vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre] vvp_page_completion_write at ffffffffa0a7b83e [lustre] cl_page_completion at ffffffffa05eed8f [obdclass] osc_completion at ffffffffa0880812 [osc] osc_ap_completion at ffffffffa086a544 [osc] brw_interpret at ffffffffa0876d69 [osc] But we don't even have a call to osc_ap_completion from brw_interpret anymore. osc_ap_completion() itself has a comment that it is to be called under cl_loi_list_lock, but then tries to take it itself, so the comment is definitely stale. And osc_completion() is called outside of that coverage. I tend to think the comment is stale now, but need to do some more investigations before I am 100% sure of that. Thanks for bringing it to our attention. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn() @ 2016-07-28 19:25 ` Oleg Drokin 0 siblings, 0 replies; 7+ messages in thread From: Oleg Drokin @ 2016-07-28 19:25 UTC (permalink / raw) To: Al Viro Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List, Jinshan Xiong On Jul 28, 2016, at 2:26 PM, Al Viro wrote: > /* this can be called inside spin lock so use GFP_ATOMIC. */ > buf = (char *)__get_free_page(GFP_ATOMIC); > if (buf) { > dentry = d_find_alias(page->mapping->host); > ... > if (dentry) > dput(dentry); > > If it *can* be called under a spinlock, you have an obvious problem - > dput() can sleep. d_find_alias() might've picked a hashed dentry with > zero refcount that got unhashed by the time of dput(). Or other references > used to exist, but got dropped by that point... Ah, the dput()->dentry_kill()->cpu_relax() I guess? (the final iput cannot catch us here, I think, because we still have pages in the mapping) Hm? So the original reported path was: ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre] vvp_page_completion_common at ffffffffa0a7adfc [lustre] vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre] vvp_page_completion_write at ffffffffa0a7b83e [lustre] cl_page_completion at ffffffffa05eed8f [obdclass] osc_completion at ffffffffa0880812 [osc] osc_ap_completion at ffffffffa086a544 [osc] brw_interpret at ffffffffa0876d69 [osc] But we don't even have a call to osc_ap_completion from brw_interpret anymore. osc_ap_completion() itself has a comment that it is to be called under cl_loi_list_lock, but then tries to take it itself, so the comment is definitely stale. And osc_completion() is called outside of that coverage. I tend to think the comment is stale now, but need to do some more investigations before I am 100% sure of that. Thanks for bringing it to our attention. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [lustre-devel] insanity in ll_dirty_page_discard_warn() 2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin @ 2016-07-29 17:22 ` Ben Evans -1 siblings, 0 replies; 7+ messages in thread From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw) To: Oleg Drokin, Al Viro Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch of the asserts are not true since various structures haven't been initialized yet, so there is definitely some work to be done in the area. -Ben On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin" <lustre-devel-bounces@lists.lustre.org on behalf of oleg.drokin@intel.com> wrote: > >On Jul 28, 2016, at 2:26 PM, Al Viro wrote: > >> /* this can be called inside spin lock so use GFP_ATOMIC. */ >> buf = (char *)__get_free_page(GFP_ATOMIC); >> if (buf) { >> dentry = d_find_alias(page->mapping->host); >> ... >> if (dentry) >> dput(dentry); >> >> If it *can* be called under a spinlock, you have an obvious problem - >> dput() can sleep. d_find_alias() might've picked a hashed dentry with >> zero refcount that got unhashed by the time of dput(). Or other >>references >> used to exist, but got dropped by that point... > >Ah, the dput()->dentry_kill()->cpu_relax() I guess? > >(the final iput cannot catch us here, I think, because we still have pages >in the mapping) > >Hm� So the original reported path was: > ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre] > vvp_page_completion_common at ffffffffa0a7adfc [lustre] > vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre] > vvp_page_completion_write at ffffffffa0a7b83e [lustre] > cl_page_completion at ffffffffa05eed8f [obdclass] > osc_completion at ffffffffa0880812 [osc] > osc_ap_completion at ffffffffa086a544 [osc] > brw_interpret at ffffffffa0876d69 [osc] > >But we don't even have a call to osc_ap_completion from brw_interpret >anymore. > >osc_ap_completion() itself has a comment that it is to be called under >cl_loi_list_lock, but then tries to take it itself, so the comment >is definitely stale. >And osc_completion() is called outside of that coverage. > >I tend to think the comment is stale now, but need to do some more >investigations >before I am 100% sure of that. > >Thanks for bringing it to our attention. >_______________________________________________ >lustre-devel mailing list >lustre-devel@lists.lustre.org >http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn() @ 2016-07-29 17:22 ` Ben Evans 0 siblings, 0 replies; 7+ messages in thread From: Ben Evans @ 2016-07-29 17:22 UTC (permalink / raw) To: Oleg Drokin, Al Viro Cc: <linux-fsdevel@vger.kernel.org>, Lustre Development List I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch of the asserts are not true since various structures haven't been initialized yet, so there is definitely some work to be done in the area. -Ben On 7/28/16, 3:25 PM, "lustre-devel on behalf of Oleg Drokin" <lustre-devel-bounces at lists.lustre.org on behalf of oleg.drokin@intel.com> wrote: > >On Jul 28, 2016, at 2:26 PM, Al Viro wrote: > >> /* this can be called inside spin lock so use GFP_ATOMIC. */ >> buf = (char *)__get_free_page(GFP_ATOMIC); >> if (buf) { >> dentry = d_find_alias(page->mapping->host); >> ... >> if (dentry) >> dput(dentry); >> >> If it *can* be called under a spinlock, you have an obvious problem - >> dput() can sleep. d_find_alias() might've picked a hashed dentry with >> zero refcount that got unhashed by the time of dput(). Or other >>references >> used to exist, but got dropped by that point... > >Ah, the dput()->dentry_kill()->cpu_relax() I guess? > >(the final iput cannot catch us here, I think, because we still have pages >in the mapping) > >Hm? So the original reported path was: > ll_dirty_page_discard_warn at ffffffffa0a3d252 [lustre] > vvp_page_completion_common at ffffffffa0a7adfc [lustre] > vvp_page_completion_write_common at ffffffffa0a7ae6b [lustre] > vvp_page_completion_write at ffffffffa0a7b83e [lustre] > cl_page_completion at ffffffffa05eed8f [obdclass] > osc_completion at ffffffffa0880812 [osc] > osc_ap_completion at ffffffffa086a544 [osc] > brw_interpret at ffffffffa0876d69 [osc] > >But we don't even have a call to osc_ap_completion from brw_interpret >anymore. > >osc_ap_completion() itself has a comment that it is to be called under >cl_loi_list_lock, but then tries to take it itself, so the comment >is definitely stale. >And osc_completion() is called outside of that coverage. > >I tend to think the comment is stale now, but need to do some more >investigations >before I am 100% sure of that. > >Thanks for bringing it to our attention. >_______________________________________________ >lustre-devel mailing list >lustre-devel at lists.lustre.org >http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn() 2016-07-29 17:22 ` Ben Evans (?) @ 2016-08-01 7:54 ` DEGREMONT Aurelien 2016-08-01 13:14 ` Ben Evans -1 siblings, 1 reply; 7+ messages in thread From: DEGREMONT Aurelien @ 2016-08-01 7:54 UTC (permalink / raw) To: lustre-devel Le 29/07/2016 ? 19:22, Ben Evans a ?crit : > I'm also working on a fix in osc_completion where rc=-ENOMEM, and a bunch > of the asserts are not true since various structures haven't been > initialized yet, so there is definitely some work to be done in the area. Ben, did you notice? https://jira.hpdd.intel.com/browse/LU-8435 Do you have a public tracking of your progress in this area? (jira, gerrit, ... Aur?lien ^ permalink raw reply [flat|nested] 7+ messages in thread
* [lustre-devel] insanity in ll_dirty_page_discard_warn() 2016-08-01 7:54 ` DEGREMONT Aurelien @ 2016-08-01 13:14 ` Ben Evans 0 siblings, 0 replies; 7+ messages in thread From: Ben Evans @ 2016-08-01 13:14 UTC (permalink / raw) To: lustre-devel No, I'd looked for that failure before, but there was nothing publicly available. I finally have a dev setup to reproduce this error, so I'll be working on the fix this week. -Ben On 8/1/16, 3:54 AM, "DEGREMONT Aurelien" <aurelien.degremont@cea.fr> wrote: >Le 29/07/2016 ? 19:22, Ben Evans a ?crit : >> I'm also working on a fix in osc_completion where rc=-ENOMEM, and a >>bunch >> of the asserts are not true since various structures haven't been >> initialized yet, so there is definitely some work to be done in the >>area. >Ben, did you notice? >https://jira.hpdd.intel.com/browse/LU-8435 > >Do you have a public tracking of your progress in this area? (jira, >gerrit, ... > > >Aur?lien ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-08-01 13:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-28 18:26 insanity in ll_dirty_page_discard_warn() Al Viro 2016-07-28 19:25 ` Oleg Drokin 2016-07-28 19:25 ` [lustre-devel] " Oleg Drokin 2016-07-29 17:22 ` Ben Evans 2016-07-29 17:22 ` Ben Evans 2016-08-01 7:54 ` DEGREMONT Aurelien 2016-08-01 13:14 ` Ben Evans
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.