From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Julien Grall <julien.grall@arm.com>,
Aaron Cornelius <Aaron.Cornelius@dornerworks.com>,
Xen-devel <xen-devel@lists.xenproject.org>,
Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: Xen 4.7 crash
Date: Wed, 1 Jun 2016 23:26:29 +0100 [thread overview]
Message-ID: <95d770bb-25cb-ca78-96f8-501d17015796@citrix.com> (raw)
In-Reply-To: <75b9c560-d7db-c636-251d-8bf36bad5ae2@arm.com>
On 01/06/2016 23:18, Julien Grall wrote:
> Hi Andrew,
>
> On 01/06/2016 22:24, Andrew Cooper wrote:
>> On 01/06/2016 21:45, Aaron Cornelius wrote:
>>>>
>>>>> However, since I only have 1 domain active at a time, I'm not sure
>>>>> why I
>>>> should run out of VM IDs.
>>>>
>>>> Sounds like a VMID resource leak. Check to see whether it is freed
>>>> properly
>>>> in domain_destroy().
>>>>
>>>> ~Andrew
>>> That would be my assumption. But as far as I can tell,
>>> arch_domain_destroy() calls pwm_teardown() which calls
>>> p2m_free_vmid(), and none of the functionality related to freeing a
>>> VM ID appears to have changed in years.
>>
>> The VMID handling looks suspect. It can be called repeatedly during
>> domain destruction, and it will repeatedly clear the same bit out of the
>> vmid_mask.
>
> Can you explain how the p2m_free_vmid can be called multiple time?
>
> We have the following path:
> arch_domain_destroy -> p2m_teardown -> p2m_free_vmid.
>
> And I can find only 3 call of arch_domain_destroy we should only be
> done once per domain.
>
> If arch_domain_destroy is called multiple time, p2m_free_vmid will not
> be the only place where Xen will be in trouble.
You are correct. I was getting my phases of domain destruction mixed
up. arch_domain_destroy() is strictly once, after the RCU reference of
the domain has dropped to 0.
>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 838d004..7adb39a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1393,7 +1393,10 @@ static void p2m_free_vmid(struct domain *d)
>> struct p2m_domain *p2m = &d->arch.p2m;
>> spin_lock(&vmid_alloc_lock);
>> if ( p2m->vmid != INVALID_VMID )
>> - clear_bit(p2m->vmid, vmid_mask);
>> + {
>> + ASSERT(test_and_clear_bit(p2m->vmid, vmid_mask));
>> + p2m->vmid = INVALID_VMID;
>> + }
>>
>> spin_unlock(&vmid_alloc_lock);
>> }
>>
>> Having said that, I can't explain why that bug would result in the
>> symptoms you are seeing. It is also possibly that your issue is memory
>> corruption from a separate source.
>>
>> Can you see about instrumenting p2m_alloc_vmid()/p2m_free_vmid() (with
>> vmid_alloc_lock held) to see which vmid is being allocated/freed ?
>> After the initial boot of the system, you should see the same vmid being
>> allocated and freed for each of your domains.
>
> Looking quickly at the log, the domain is dom1101. However, the number
> maximum number of VMID supported is 256, so the exhaustion might be a
> race somewhere.
>
> I would be interested to get a reproducer. I wrote a script to cycle a
> domain (create/domain) in loop, and I have not seen any issue after
> 1200 cycles (and counting).
Given that my previous thought was wrong, I am going to suggest that
some other form of memory corruption is a more likely cause.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-06-01 22:26 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-01 19:54 Xen 4.7 crash Aaron Cornelius
2016-06-01 20:00 ` Andrew Cooper
2016-06-01 20:45 ` Aaron Cornelius
2016-06-01 21:24 ` Andrew Cooper
2016-06-01 22:18 ` Julien Grall
2016-06-01 22:26 ` Andrew Cooper [this message]
2016-06-01 21:35 ` Andrew Cooper
2016-06-01 22:24 ` Julien Grall
2016-06-01 22:31 ` Andrew Cooper
2016-06-02 8:47 ` Jan Beulich
2016-06-02 8:53 ` Andrew Cooper
2016-06-02 9:07 ` Jan Beulich
2016-06-01 22:35 ` Julien Grall
2016-06-02 1:32 ` Aaron Cornelius
2016-06-02 8:49 ` Jan Beulich
2016-06-02 9:07 ` Julien Grall
2016-06-06 13:58 ` Aaron Cornelius
2016-06-06 14:05 ` Julien Grall
2016-06-06 14:19 ` Wei Liu
2016-06-06 15:02 ` Aaron Cornelius
2016-06-07 9:53 ` Ian Jackson
2016-06-07 13:40 ` Aaron Cornelius
2016-06-07 15:13 ` Aaron Cornelius
2016-06-09 11:14 ` Ian Jackson
2016-06-14 13:11 ` Aaron Cornelius
2016-06-14 13:15 ` Wei Liu
2016-06-14 13:26 ` Aaron Cornelius
2016-06-14 13:38 ` Aaron Cornelius
2016-06-14 13:47 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=95d770bb-25cb-ca78-96f8-501d17015796@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Aaron.Cornelius@dornerworks.com \
--cc=julien.grall@arm.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).