All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hyman Huang <huangy81@chinatelecom.cn>
To: Peter Xu <peterx@redhat.com>
Cc: "Eduardo Habkost" <eduardo@habkost.net>,
	"David Hildenbrand" <david@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Markus ArmBruster" <armbru@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation
Date: Mon, 24 Jan 2022 17:36:47 +0800	[thread overview]
Message-ID: <df5ecc84-546e-aee9-fd8e-9265a3130c96@chinatelecom.cn> (raw)
In-Reply-To: <Ye4YRsXDfvjuoPsh@xz-m1.local>



在 2022/1/24 11:08, Peter Xu 写道:
> On Sat, Jan 22, 2022 at 11:22:37AM +0800, Hyman Huang wrote:
>>
>>
>> 在 2022/1/17 10:19, Peter Xu 写道:
>>> On Wed, Jan 05, 2022 at 01:14:06AM +0800, huangy81@chinatelecom.cn wrote:
>>>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>>
>>>> +
>>>> +static void vcpu_dirty_stat_collect(VcpuStat *stat,
>>>> +                                    DirtyPageRecord *records,
>>>> +                                    bool start)
>>>> +{
>>>> +    CPUState *cpu;
>>>> +
>>>> +    CPU_FOREACH(cpu) {
>>>> +        if (!start && cpu->cpu_index >= stat->nvcpu) {
>>>> +            /*
>>>> +             * Never go there unless cpu is hot-plugged,
>>>> +             * just ignore in this case.
>>>> +             */
>>>> +            continue;
>>>> +        }
>>>
>>> As commented before, I think the only way to do it right is does not allow cpu
>>> plug/unplug during measurement..
>>>
>>> Say, even if index didn't get out of range, an unplug even should generate very
>>> stange output of the unplugged cpu.  Please see more below.
>>>
>>>> +        record_dirtypages(records, cpu, start);
>>>> +    }
>>>> +}
>>>> +
>>>> +int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
>>>> +                                 int64_t init_time_ms,
>>>> +                                 VcpuStat *stat,
>>>> +                                 unsigned int flag,
>>>> +                                 bool one_shot)
>>>> +{
>>>> +    DirtyPageRecord *records;
>>>> +    int64_t duration;
>>>> +    int64_t dirtyrate;
>>>> +    int i = 0;
>>>> +
>>>> +    cpu_list_lock();
>>>> +    records = vcpu_dirty_stat_alloc(stat);
>>>> +    vcpu_dirty_stat_collect(stat, records, true);
>>>> +    cpu_list_unlock();
>>>
>>> Continue with above - then I'm wondering whether we should just keep taking the
>>> lock until vcpu_dirty_stat_collect().
>>>
>>> Yes we could be taking the lock for a long time because of the sleep, but the
>>> main thread plug thread will just wait for it to complete and it is at least
>>> not a e.g. deadlock.
>>>
>>> The other solution is we do cpu_list_unlock() like this, but introduce another
>>> cpu_list_generation_id and boost it after any plug/unplug of cpu, aka, when cpu
>>> list changes.
>>>
>>> Then we record cpu generation ID at the entry of this function and retry the
>>> whole measurement if at some point we found generation ID changed (we need to
>>> fetch the gen ID after having the lock, of course).  That could avoid us taking
>>> the cpu list lock during dirty_stat_wait(), but it'll start to complicate cpu
>>> list locking rules.
>>>
>>> The simpler way is still just to take the lock, imho.
>>>
>> Hi, Peter, i'm working on this as you suggetion, and keep taking the
>> cpu_list_lock during dirty page rate calculation. I found the deadlock when
>> testing hotplug scenario, the logic is as the following:
>>
>> calc thread				qemu main thread
>> 1. take qemu_cpu_list_lock
>> 					1. take the BQL
>> 2. collect dirty page and wait		2. cpu hotplug
>> 					3. take qemu_cpu_list_lock
>> 3. take the BQL
>>
>> 4. sync dirty log			
>>
>> 5. release the BQL
>>
>> I just recall that is one of the reasons why i handle the plug/unplug
>> scenario(another is cpu plug may wait a little bit long time when dirtylimit
>> in service).
> 
> Ah I should have noticed the bql dependency with cpu list lock before..
> 
> I think having the cpu plug waiting for one sec is fine, because the mgmt app
> should be aware of both so it shouldn't even happen in practise (not good
> timing to plug during pre-migration).  However bql is definitely another
> story..  which I agree.
> 
>>
>> It seems that we have two strategies, one is just keep this logic untouched
>> in v12 and add "cpu_list_generation_id" implementaion in TODO list(once this
>> patchset been merged, i'll try that out), another is introducing the
>> "cpu_list_generation_id" right now.
>>
>> What strategy do you prefer to?
> 
> I prefer having the gen_id patch.  The thing is it should be less than 10 lines
> and the logic should be fairly straightforward.  While if without it, it seems
> always on risk to use this new feature.
> 
> I hope I didn't overlook any existing mechanism to block cpu plug/unplug for
> some period, though, or we should use it.
> 
>>
>> Uh... I think the "unmatched_cnt" also kind of like this too, becauce once
>> we remove the "unmatched count" logic, the throttle algo is more likely to
>> oscillate and i prefer to add the "unmatched_cnt" in TODO list as above.
> 
> Could we tune the differential factor to make it less possible to oscillate?
> Uh... From certain angles, yes. When current dirty pate rate is nearly 
close to quota when dirtylimit in service, throttle achieve balance. 
Once the current dirty page rate arise a slight fluctuation(not much
oscillation), sleep time be adjusted which actually can be ignored.
> I still can't say I like "unmatched cnt" idea a lot..  From a PID pov (partial,
> integral, differential) you've already got partial + differential, and IMHO
> that "unmatched cnt" solution was trying to mimic an "integral" delta.  Instead
> of doing an mean value calculation (as in most integral system does) the
> "unmatched cnt" solution literally made it an array of 2 and it dropped the 1st
> element..  Hence a decision was made only from the 2nd data you collected.
> 
>  From that POV I think it's cleaner you add a real (but simple) integral algo
> into it?  It can be e.g. an array of 3, then when you do the math you use the
> average of the three dirty rates.  Would that work (and also look a bit
> cleaner)?
Yes, IMHO this is a more complete algo and we can try it out. So, let's 
see the v12 test result and decide whether above work should added to 
TODO list. :)
> 
> Thanks,
> 

-- 
Best regard

Hyman Huang(黄勇)


  reply	other threads:[~2022-01-24  9:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04 17:14 [PATCH v11 0/4] support dirty restraint on vCPU huangy81
     [not found] ` <cover.1641316375.git.huangy81@chinatelecom.cn>
2022-01-04 17:14   ` [PATCH v11 1/4] migration/dirtyrate: refactor dirty page rate calculation huangy81
2022-01-17  2:19     ` Peter Xu
2022-01-22  3:22       ` Hyman Huang
2022-01-24  3:08         ` Peter Xu
2022-01-24  9:36           ` Hyman Huang [this message]
2022-01-04 17:14   ` [PATCH v11 2/4] softmmu/dirtylimit: implement vCPU dirtyrate calculation periodically huangy81
2022-01-17  2:31     ` Peter Xu
2022-01-04 17:14   ` [PATCH v11 3/4] softmmu/dirtylimit: implement virtual CPU throttle huangy81
2022-01-17  7:32     ` Peter Xu
2022-01-17 14:00       ` Hyman Huang
2022-01-18  1:00         ` Peter Xu
2022-01-18  2:08           ` Hyman Huang
2022-01-20  8:26       ` Hyman Huang
2022-01-20  9:25         ` Peter Xu
2022-01-20 10:03           ` Hyman Huang
2022-01-20 10:58             ` Peter Xu
2022-01-20 10:39           ` Hyman Huang
2022-01-20 10:56             ` Peter Xu
2022-01-20 11:03               ` Hyman Huang
2022-01-20 11:13                 ` Peter Xu
2022-01-21  8:07       ` Hyman Huang
2022-01-21  9:19         ` Peter Xu
2022-01-22  3:54       ` Hyman Huang
2022-01-24  3:10         ` Peter Xu
2022-01-24  4:20       ` Hyman Huang
2022-01-17  9:01     ` Peter Xu
2022-01-19 12:16     ` Markus Armbruster
2022-01-20 11:22       ` Hyman Huang
2022-01-04 17:14   ` [PATCH v11 4/4] softmmu/dirtylimit: implement dirty page rate limit huangy81
2022-01-17  7:35     ` Peter Xu
2022-01-19 12:16     ` Markus Armbruster
2022-01-17  8:54 ` [PATCH v11 0/4] support dirty restraint on vCPU Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df5ecc84-546e-aee9-fd8e-9265a3130c96@chinatelecom.cn \
    --to=huangy81@chinatelecom.cn \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.