All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Julien Grall <julien.grall@arm.com>,
	Aaron Cornelius <Aaron.Cornelius@dornerworks.com>,
	Xen-devel <xen-devel@lists.xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: Xen 4.7 crash
Date: Wed, 1 Jun 2016 23:26:29 +0100	[thread overview]
Message-ID: <95d770bb-25cb-ca78-96f8-501d17015796@citrix.com> (raw)
In-Reply-To: <75b9c560-d7db-c636-251d-8bf36bad5ae2@arm.com>

On 01/06/2016 23:18, Julien Grall wrote:
> Hi Andrew,
>
> On 01/06/2016 22:24, Andrew Cooper wrote:
>> On 01/06/2016 21:45, Aaron Cornelius wrote:
>>>>
>>>>> However, since I only have 1 domain active at a time, I'm not sure
>>>>> why I
>>>> should run out of VM IDs.
>>>>
>>>> Sounds like a VMID resource leak.  Check to see whether it is freed
>>>> properly
>>>> in domain_destroy().
>>>>
>>>> ~Andrew
>>> That would be my assumption.  But as far as I can tell,
>>> arch_domain_destroy() calls pwm_teardown() which calls
>>> p2m_free_vmid(), and none of the functionality related to freeing a
>>> VM ID appears to have changed in years.
>>
>> The VMID handling looks suspect.  It can be called repeatedly during
>> domain destruction, and it will repeatedly clear the same bit out of the
>> vmid_mask.
>
> Can you explain how the p2m_free_vmid can be called multiple time?
>
> We have the following path:
>    arch_domain_destroy -> p2m_teardown -> p2m_free_vmid.
>
> And I can find only 3 call of arch_domain_destroy we should only be
> done once per domain.
>
> If arch_domain_destroy is called multiple time, p2m_free_vmid will not
> be the only place where Xen will be in trouble.

You are correct.  I was getting my phases of domain destruction mixed
up.  arch_domain_destroy() is strictly once, after the RCU reference of
the domain has dropped to 0.

>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 838d004..7adb39a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1393,7 +1393,10 @@ static void p2m_free_vmid(struct domain *d)
>>      struct p2m_domain *p2m = &d->arch.p2m;
>>      spin_lock(&vmid_alloc_lock);
>>      if ( p2m->vmid != INVALID_VMID )
>> -        clear_bit(p2m->vmid, vmid_mask);
>> +    {
>> +        ASSERT(test_and_clear_bit(p2m->vmid, vmid_mask));
>> +        p2m->vmid = INVALID_VMID;
>> +    }
>>
>>      spin_unlock(&vmid_alloc_lock);
>>  }
>>
>> Having said that, I can't explain why that bug would result in the
>> symptoms you are seeing.  It is also possibly that your issue is memory
>> corruption from a separate source.
>>
>> Can you see about instrumenting p2m_alloc_vmid()/p2m_free_vmid() (with
>> vmid_alloc_lock held) to see which vmid is being allocated/freed ?
>> After the initial boot of the system, you should see the same vmid being
>> allocated and freed for each of your domains.
>
> Looking quickly at the log, the domain is dom1101. However, the number
> maximum number of VMID supported is 256, so the exhaustion might be a
> race somewhere.
>
> I would be interested to get a reproducer. I wrote a script to cycle a
> domain (create/domain) in loop, and I have not seen any issue after
> 1200 cycles (and counting).

Given that my previous thought was wrong, I am going to suggest that
some other form of memory corruption is a more likely cause.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-06-01 22:26 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 19:54 Xen 4.7 crash Aaron Cornelius
2016-06-01 20:00 ` Andrew Cooper
2016-06-01 20:45   ` Aaron Cornelius
2016-06-01 21:24     ` Andrew Cooper
2016-06-01 22:18       ` Julien Grall
2016-06-01 22:26         ` Andrew Cooper [this message]
2016-06-01 21:35 ` Andrew Cooper
2016-06-01 22:24   ` Julien Grall
2016-06-01 22:31     ` Andrew Cooper
2016-06-02  8:47       ` Jan Beulich
2016-06-02  8:53         ` Andrew Cooper
2016-06-02  9:07           ` Jan Beulich
2016-06-01 22:35 ` Julien Grall
2016-06-02  1:32   ` Aaron Cornelius
2016-06-02  8:49     ` Jan Beulich
2016-06-02  9:07     ` Julien Grall
2016-06-06 13:58       ` Aaron Cornelius
2016-06-06 14:05         ` Julien Grall
2016-06-06 14:19           ` Wei Liu
2016-06-06 15:02             ` Aaron Cornelius
2016-06-07  9:53               ` Ian Jackson
2016-06-07 13:40                 ` Aaron Cornelius
2016-06-07 15:13                   ` Aaron Cornelius
2016-06-09 11:14                     ` Ian Jackson
2016-06-14 13:11                       ` Aaron Cornelius
2016-06-14 13:15                         ` Wei Liu
2016-06-14 13:26                           ` Aaron Cornelius
2016-06-14 13:38                             ` Aaron Cornelius
2016-06-14 13:47                               ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95d770bb-25cb-ca78-96f8-501d17015796@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Aaron.Cornelius@dornerworks.com \
    --cc=julien.grall@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.