xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Julien Grall <julien.grall@arm.com>,
	Aaron Cornelius <Aaron.Cornelius@dornerworks.com>,
	Xen-devel <xen-devel@lists.xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: Xen 4.7 crash
Date: Wed, 1 Jun 2016 23:26:29 +0100	[thread overview]
Message-ID: <95d770bb-25cb-ca78-96f8-501d17015796@citrix.com> (raw)
In-Reply-To: <75b9c560-d7db-c636-251d-8bf36bad5ae2@arm.com>

On 01/06/2016 23:18, Julien Grall wrote:
> Hi Andrew,
>
> On 01/06/2016 22:24, Andrew Cooper wrote:
>> On 01/06/2016 21:45, Aaron Cornelius wrote:
>>>>
>>>>> However, since I only have 1 domain active at a time, I'm not sure
>>>>> why I
>>>> should run out of VM IDs.
>>>>
>>>> Sounds like a VMID resource leak.  Check to see whether it is freed
>>>> properly
>>>> in domain_destroy().
>>>>
>>>> ~Andrew
>>> That would be my assumption.  But as far as I can tell,
>>> arch_domain_destroy() calls pwm_teardown() which calls
>>> p2m_free_vmid(), and none of the functionality related to freeing a
>>> VM ID appears to have changed in years.
>>
>> The VMID handling looks suspect.  It can be called repeatedly during
>> domain destruction, and it will repeatedly clear the same bit out of the
>> vmid_mask.
>
> Can you explain how the p2m_free_vmid can be called multiple time?
>
> We have the following path:
>    arch_domain_destroy -> p2m_teardown -> p2m_free_vmid.
>
> And I can find only 3 call of arch_domain_destroy we should only be
> done once per domain.
>
> If arch_domain_destroy is called multiple time, p2m_free_vmid will not
> be the only place where Xen will be in trouble.

You are correct.  I was getting my phases of domain destruction mixed
up.  arch_domain_destroy() is strictly once, after the RCU reference of
the domain has dropped to 0.

>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 838d004..7adb39a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1393,7 +1393,10 @@ static void p2m_free_vmid(struct domain *d)
>>      struct p2m_domain *p2m = &d->arch.p2m;
>>      spin_lock(&vmid_alloc_lock);
>>      if ( p2m->vmid != INVALID_VMID )
>> -        clear_bit(p2m->vmid, vmid_mask);
>> +    {
>> +        ASSERT(test_and_clear_bit(p2m->vmid, vmid_mask));
>> +        p2m->vmid = INVALID_VMID;
>> +    }
>>
>>      spin_unlock(&vmid_alloc_lock);
>>  }
>>
>> Having said that, I can't explain why that bug would result in the
>> symptoms you are seeing.  It is also possibly that your issue is memory
>> corruption from a separate source.
>>
>> Can you see about instrumenting p2m_alloc_vmid()/p2m_free_vmid() (with
>> vmid_alloc_lock held) to see which vmid is being allocated/freed ?
>> After the initial boot of the system, you should see the same vmid being
>> allocated and freed for each of your domains.
>
> Looking quickly at the log, the domain is dom1101. However, the number
> maximum number of VMID supported is 256, so the exhaustion might be a
> race somewhere.
>
> I would be interested to get a reproducer. I wrote a script to cycle a
> domain (create/domain) in loop, and I have not seen any issue after
> 1200 cycles (and counting).

Given that my previous thought was wrong, I am going to suggest that
some other form of memory corruption is a more likely cause.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-06-01 22:26 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 19:54 Xen 4.7 crash Aaron Cornelius
2016-06-01 20:00 ` Andrew Cooper
2016-06-01 20:45   ` Aaron Cornelius
2016-06-01 21:24     ` Andrew Cooper
2016-06-01 22:18       ` Julien Grall
2016-06-01 22:26         ` Andrew Cooper [this message]
2016-06-01 21:35 ` Andrew Cooper
2016-06-01 22:24   ` Julien Grall
2016-06-01 22:31     ` Andrew Cooper
2016-06-02  8:47       ` Jan Beulich
2016-06-02  8:53         ` Andrew Cooper
2016-06-02  9:07           ` Jan Beulich
2016-06-01 22:35 ` Julien Grall
2016-06-02  1:32   ` Aaron Cornelius
2016-06-02  8:49     ` Jan Beulich
2016-06-02  9:07     ` Julien Grall
2016-06-06 13:58       ` Aaron Cornelius
2016-06-06 14:05         ` Julien Grall
2016-06-06 14:19           ` Wei Liu
2016-06-06 15:02             ` Aaron Cornelius
2016-06-07  9:53               ` Ian Jackson
2016-06-07 13:40                 ` Aaron Cornelius
2016-06-07 15:13                   ` Aaron Cornelius
2016-06-09 11:14                     ` Ian Jackson
2016-06-14 13:11                       ` Aaron Cornelius
2016-06-14 13:15                         ` Wei Liu
2016-06-14 13:26                           ` Aaron Cornelius
2016-06-14 13:38                             ` Aaron Cornelius
2016-06-14 13:47                               ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95d770bb-25cb-ca78-96f8-501d17015796@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Aaron.Cornelius@dornerworks.com \
    --cc=julien.grall@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).