From: Glenn Enright <glenn@rimuhosting.com>
To: "Juergen Gross" <jgross@suse.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jennifer Herbert <Jennifer.Herbert@citrix.com>,
xen-devel@lists.xen.org, Steven Haigh <netwiz@crc.id.au>,
Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>,
Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: null domains after xl destroy
Date: Tue, 16 May 2017 12:49:51 +1200 [thread overview]
Message-ID: <7f15f2b2-e969-0d5d-cf2f-c234910e1884@rimuhosting.com> (raw)
In-Reply-To: <a492a326-b777-52fc-343b-0ed0dd0e9bea@suse.com>
On 15/05/17 21:57, Juergen Gross wrote:
> On 13/05/17 06:02, Glenn Enright wrote:
>> On 09/05/17 21:24, Roger Pau Monné wrote:
>>> On Mon, May 08, 2017 at 11:10:24AM +0200, Juergen Gross wrote:
>>>> On 04/05/17 00:17, Glenn Enright wrote:
>>>>> On 04/05/17 04:58, Steven Haigh wrote:
>>>>>> On 04/05/17 01:53, Juergen Gross wrote:
>>>>>>> On 03/05/17 12:45, Steven Haigh wrote:
>>>>>>>> Just wanted to give this a little nudge now people seem to be
>>>>>>>> back on
>>>>>>>> deck...
>>>>>>>
>>>>>>> Glenn, could you please give the attached patch a try?
>>>>>>>
>>>>>>> It should be applied on top of the other correction, the old debug
>>>>>>> patch should not be applied.
>>>>>>>
>>>>>>> I have added some debug output to make sure we see what is happening.
>>>>>>
>>>>>> This patch is included in kernel-xen-4.9.26-1
>>>>>>
>>>>>> It should be in the repos now.
>>>>>>
>>>>>
>>>>> Still seeing the same issue. Without the extra debug patch all I see in
>>>>> the logs after destroy is this...
>>>>>
>>>>> xen-blkback: xen_blkif_disconnect: busy
>>>>> xen-blkback: xen_blkif_free: delayed = 0
>>>>
>>>> Hmm, to me it seems as if some grant isn't being unmapped.
>>>>
>>>> Looking at gnttab_unmap_refs_async() I wonder how this is supposed to
>>>> work:
>>>>
>>>> I don't see how a grant would ever be unmapped in case of
>>>> page_count(item->pages[pc]) > 1 in __gnttab_unmap_refs_async(). All it
>>>> does is deferring the call to the unmap operation again and again. Or
>>>> am I missing something here?
>>>
>>> No, I don't think you are missing anything, but I cannot see how this
>>> can be
>>> solved in a better way, unmapping a page that's still referenced is
>>> certainly
>>> not the best option, or else we risk triggering a page-fault elsewhere.
>>>
>>> IMHO, gnttab_unmap_refs_async should have a timeout, and return an
>>> error at
>>> some point. Also, I'm wondering whether there's a way to keep track of
>>> who has
>>> references on a specific page, but so far I haven't been able to
>>> figure out how
>>> to get this information from Linux.
>>>
>>> Also, I've noticed that __gnttab_unmap_refs_async uses page_count,
>>> shouldn't it
>>> use page_ref_count instead?
>>>
>>> Roger.
>>>
>>
>> In case it helps, I have continued to work on this. I notices processed
>> left behind (under 4.9.27). The same issue is ongoing.
>>
>> # ps auxf | grep [x]vda
>> root 2983 0.0 0.0 0 0 ? S 01:44 0:00 \_
>> [1.xvda1-1]
>> root 5457 0.0 0.0 0 0 ? S 02:06 0:00 \_
>> [3.xvda1-1]
>> root 7382 0.0 0.0 0 0 ? S 02:36 0:00 \_
>> [4.xvda1-1]
>> root 9668 0.0 0.0 0 0 ? S 02:51 0:00 \_
>> [6.xvda1-1]
>> root 11080 0.0 0.0 0 0 ? S 02:57 0:00 \_
>> [7.xvda1-1]
>>
>> # xl list
>> Name ID Mem VCPUs State Time(s)
>> Domain-0 0 1512 2 r----- 118.5
>> (null) 1 8 4 --p--d 43.8
>> (null) 3 8 4 --p--d 6.3
>> (null) 4 8 4 --p--d 73.4
>> (null) 6 8 4 --p--d 14.7
>> (null) 7 8 4 --p--d 30
>>
>> Those all have...
>>
>> [root 11080]# cat wchan
>> xen_blkif_schedule
>>
>> [root 11080]# cat stack
>> [<ffffffff814eaee8>] xen_blkif_schedule+0x418/0xb40
>> [<ffffffff810a0555>] kthread+0xe5/0x100
>> [<ffffffff816f1c45>] ret_from_fork+0x25/0x30
>> [<ffffffffffffffff>] 0xffffffffffffffff
>
> And found another reference count bug. Would you like to give the
> attached patch (to be applied additionally to the previous ones) a try?
>
>
> Juergen
>
This seems to have solved the issue in 4.9.28, with all three patches
applied. Awesome!
On my main test machine I can no longer replicate what I was originally
seeing, and in dmesg I now see this flow...
xen-blkback: xen_blkif_disconnect: busy
xen-blkback: xen_blkif_free: delayed = 1
xen-blkback: xen_blkif_free: delayed = 0
xl list is clean, xenstore looks right. No extraneous processes left over.
Thankyou Juergen, so much. Really appreciate your persistence with this.
Anything I can do to help push this upstream please let me know. Feel
free to add a reported-by line with my name if you think it appropriate.
Regards, Glenn
http://rimuhosting.com
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-05-16 0:49 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-11 5:25 null domains after xl destroy Glenn Enright
2017-04-11 5:59 ` Juergen Gross
2017-04-11 8:03 ` Glenn Enright
2017-04-11 9:49 ` Dietmar Hahn
2017-04-11 22:13 ` Glenn Enright
2017-04-11 22:23 ` Andrew Cooper
2017-04-11 22:45 ` Glenn Enright
2017-04-18 8:36 ` Juergen Gross
2017-04-19 1:02 ` Glenn Enright
2017-04-19 4:39 ` Juergen Gross
2017-04-19 7:16 ` Roger Pau Monné
2017-04-19 7:35 ` Juergen Gross
2017-04-19 10:09 ` Juergen Gross
2017-04-19 16:22 ` Steven Haigh
2017-04-21 8:42 ` Steven Haigh
2017-04-21 8:44 ` Juergen Gross
2017-05-01 0:55 ` Glenn Enright
2017-05-03 10:45 ` Steven Haigh
2017-05-03 13:38 ` Juergen Gross
2017-05-03 15:53 ` Juergen Gross
2017-05-03 16:58 ` Steven Haigh
2017-05-03 22:17 ` Glenn Enright
2017-05-08 9:10 ` Juergen Gross
2017-05-09 9:24 ` Roger Pau Monné
2017-05-13 4:02 ` Glenn Enright
2017-05-15 9:57 ` Juergen Gross
2017-05-16 0:49 ` Glenn Enright [this message]
2017-05-16 1:18 ` Steven Haigh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7f15f2b2-e969-0d5d-cf2f-c234910e1884@rimuhosting.com \
--to=glenn@rimuhosting.com \
--cc=Jennifer.Herbert@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=dietmar.hahn@ts.fujitsu.com \
--cc=jgross@suse.com \
--cc=netwiz@crc.id.au \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.