* [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages. @ 2019-07-11 9:44 Lin Ma 2019-07-11 10:24 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 7+ messages in thread From: Lin Ma @ 2019-07-11 9:44 UTC (permalink / raw) To: qemu-devel, dgilbert Hi all, When I live migrate a qemu/kvm guest, If the guest is using huge pages, I found that the migrate_set_speed command had no effect during stage 2. It was caused by commit 4c011c3 postcopy: Send whole huge pages I'm wondering that is it by design or is it a bug waiting for fix? Thanks, Lin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages. 2019-07-11 9:44 [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages Lin Ma @ 2019-07-11 10:24 ` Dr. David Alan Gilbert [not found] ` <BY5PR18MB331347C441DA068E32BFDE53C5F20@BY5PR18MB3313.namprd18.prod.outlook.com> 0 siblings, 1 reply; 7+ messages in thread From: Dr. David Alan Gilbert @ 2019-07-11 10:24 UTC (permalink / raw) To: Lin Ma; +Cc: qemu-devel * Lin Ma (LMa@suse.com) wrote: > Hi all, Hi Lin, > When I live migrate a qemu/kvm guest, If the guest is using huge pages, I found that > the migrate_set_speed command had no effect during stage 2. Can you explain what you mean by 'stage 2'? > It was caused by commit 4c011c3 postcopy: Send whole huge pages > > I'm wondering that is it by design or is it a bug waiting for fix? This is the first report I've seen for it. How did you conclude that 4c011c3 caused it? While I can see it might have some effect on the bandwidth management, I'm surprised it has this much effect. What size huge pages are you using - 2MB or 1GB? I can imagine we might have a problem that since we only do the sleep between the hugepages, if we were using 1GB hugepages then we'd see <big chunk of data>[sleep]<big chunk of data>[sleep] which isn't as smooth as it used to be. Can you give me some more details of your test? Dave > > Thanks, > Lin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <BY5PR18MB331347C441DA068E32BFDE53C5F20@BY5PR18MB3313.namprd18.prod.outlook.com>]
* Re: [Qemu-devel] 答复: migrate_set_speed has no effect if the guest is using hugepages. [not found] ` <BY5PR18MB331347C441DA068E32BFDE53C5F20@BY5PR18MB3313.namprd18.prod.outlook.com> @ 2019-07-12 12:34 ` Dr. David Alan Gilbert 2019-07-15 9:43 ` [Qemu-devel] 回复: " Lin Ma 0 siblings, 1 reply; 7+ messages in thread From: Dr. David Alan Gilbert @ 2019-07-12 12:34 UTC (permalink / raw) To: Lin Ma; +Cc: qemu-devel * Lin Ma (LMa@suse.com) wrote: > > > > -----邮件原件----- > > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > > 发送时间: 2019年7月11日 18:24 > > 收件人: Lin Ma <LMa@suse.com> > > 抄送: qemu-devel@nongnu.org > > 主题: Re: [Qemu-devel] migrate_set_speed has no effect if the guest is using > > hugepages. > > > > * Lin Ma (LMa@suse.com) wrote: > > > Hi all, > > > > Hi Lin, > > Hi Dave, > > > > > When I live migrate a qemu/kvm guest, If the guest is using huge > > > pages, I found that the migrate_set_speed command had no effect during > > stage 2. > > > > Can you explain what you mean by 'stage 2'? > We know that the live migration contains 3 stages: > Stage 1: Mark all of RAM dirty. > Stage 2: Keep sending dirty RAM pages since last iteration > Stage 3: Stop guest, transfer remaining dirty RAM, device state > (Please refer to https://developers.redhat.com/blog/2015/03/24/live-migrating-qemu-kvm-virtual-machines/#live-migration for further details) OK, yeh the numbering is pretty arbitrary so it's not something I normally think about like that. > > > > It was caused by commit 4c011c3 postcopy: Send whole huge pages > > > > > > I'm wondering that is it by design or is it a bug waiting for fix? > > > > This is the first report I've seen for it. How did you conclude that > > 4c011c3 caused it? While I can see it might have some effect on the > > bandwidth management, I'm surprised it has this much effect. > > While digging into the bandwidth issue, Git bisect shows that this commit was the first bad commit. OK. > > What size huge pages are you using - 2MB or 1GB? > > When I hit this issue I was using 1GB huge page size. > I tested this issue with 2MB page size today On Gigabit LAN, Although the bandwidth control looks > a little better than using 1GB, But not too much. Please refer to the below test result. OK, I can certainly see why this might happen with 1GB huge pages; I need to have a think about a fix. > > I can imagine we might have a problem that since we only do the sleep between > > the hugepages, if we were using 1GB hugepages then we'd see <big chunk of > > data>[sleep]<big chunk of data>[sleep] which isn't as smooth as it used to be. > > > > Can you give me some more details of your test? > > Live migration bandwidth management testing with 2MB hugepage size: > sles12sp4_i440fx is a qemu/kvm guest with 6GB memory size. > Note: the throughput value is approximating value. > > Terminal 1: > virsh migrate-setspeed sles12sp4_i440fx $bandwidth && virsh migrate --live sles12sp4_i440fx qemu+tcp://5810f/system > > Terminal 2: > virsh qemu-monitor-command sles12sp4_i440fx --hmp "info migrate" > > bandwidth=5 > throughput: 160 mbps > > bandwidth=10 > throughput: 167 mbps > > bandwidth=15 > throughput: 168 mbps > > bandwidth=20 > throughput: 168 mbps > > bandwidth=21 > throughput: 336 mbps > > bandwidth=22 > throughput: 336 mbps > > bandwidth=25 > throughput: 335.87 mbps > > bandwidth=30 > throughput: 335 mbps > > bandwidth=35 > throughput: 335 mbps > > bandwidth=40 > throughput: 335 mbps > > bandwidth=45 > throughput: 504.00 mbps > > bandwidth=50 > throughput: 500.00 mbps > > bandwidth=55 > throughput: 500.00 mbps > > bandwidth=60 > throughput: 500.00 mbps > > bandwidth=65 > throughput: 650.00 mbps > > bandwidth=70 > throughput: 660.00 mbps OK, so migrate-setspeed takes a bandwidth in MBytes/sec and I guess you're throughput is in MBit/sec - so at the higher end it's about right, and at the lower end it's way off. Let me think about a fix for this. What are you using to measure throughput? Dave > > Thanks, > Lin > > > > Dave > > > > > > > > Thanks, > > > Lin > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] 回复: 答复: migrate_set_speed has no effect if the guest is using hugepages. 2019-07-12 12:34 ` [Qemu-devel] 答复: " Dr. David Alan Gilbert @ 2019-07-15 9:43 ` Lin Ma 2019-08-02 10:57 ` [Qemu-devel] 回复: " Lin Ma 2019-12-05 10:31 ` 回复: 答复: [Qemu-devel] " Dr. David Alan Gilbert 0 siblings, 2 replies; 7+ messages in thread From: Lin Ma @ 2019-07-15 9:43 UTC (permalink / raw) To: Dr. David Alan Gilbert ; +Cc: qemu-devel > -----邮件原件----- > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > 发送时间: 2019年7月12日 20:34 > 收件人: Lin Ma <LMa@suse.com> > 抄送: qemu-devel@nongnu.org > 主题: Re: 答复: [Qemu-devel] migrate_set_speed has no effect if the guest is > using hugepages. > > * Lin Ma (LMa@suse.com) wrote: > > > > > > > -----邮件原件----- > > > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > 发送时间: 2019年7月11日 18:24 > > > 收件人: Lin Ma <LMa@suse.com> > > > 抄送: qemu-devel@nongnu.org > > > 主题: Re: [Qemu-devel] migrate_set_speed has no effect if the guest is > > > using hugepages. > > > > > > * Lin Ma (LMa@suse.com) wrote: > > > > Hi all, > > > > > > Hi Lin, > > > > Hi Dave, > > > > > > > When I live migrate a qemu/kvm guest, If the guest is using huge > > > > pages, I found that the migrate_set_speed command had no effect > > > > during > > > stage 2. > > > > > > Can you explain what you mean by 'stage 2'? > > We know that the live migration contains 3 stages: > > Stage 1: Mark all of RAM dirty. > > Stage 2: Keep sending dirty RAM pages since last iteration Stage 3: > > Stop guest, transfer remaining dirty RAM, device state (Please refer > > to > > https://developers.redhat.com/blog/2015/03/24/live-migrating-qemu-kvm- > > virtual-machines/#live-migration for further details) > > OK, yeh the numbering is pretty arbitrary so it's not something I normally think > about like that. > > > > > > > It was caused by commit 4c011c3 postcopy: Send whole huge pages > > > > > > > > I'm wondering that is it by design or is it a bug waiting for fix? > > > > > > This is the first report I've seen for it. How did you conclude > > > that > > > 4c011c3 caused it? While I can see it might have some effect on the > > > bandwidth management, I'm surprised it has this much effect. > > > > While digging into the bandwidth issue, Git bisect shows that this commit was > the first bad commit. > > OK. > > > > What size huge pages are you using - 2MB or 1GB? > > > > When I hit this issue I was using 1GB huge page size. > > I tested this issue with 2MB page size today On Gigabit LAN, Although > > the bandwidth control looks a little better than using 1GB, But not too much. > Please refer to the below test result. > > OK, I can certainly see why this might happen with 1GB huge pages; I need to > have a think about a fix. > > > > I can imagine we might have a problem that since we only do the > > > sleep between the hugepages, if we were using 1GB hugepages then > > > we'd see <big chunk of > > > data>[sleep]<big chunk of data>[sleep] which isn't as smooth as it used to > be. > > > > > > Can you give me some more details of your test? > > > > Live migration bandwidth management testing with 2MB hugepage size: > > sles12sp4_i440fx is a qemu/kvm guest with 6GB memory size. > > Note: the throughput value is approximating value. > > > > Terminal 1: > > virsh migrate-setspeed sles12sp4_i440fx $bandwidth && virsh migrate > > --live sles12sp4_i440fx qemu+tcp://5810f/system > > > > Terminal 2: > > virsh qemu-monitor-command sles12sp4_i440fx --hmp "info migrate" > > > > bandwidth=5 > > throughput: 160 mbps > > > > bandwidth=10 > > throughput: 167 mbps > > > > bandwidth=15 > > throughput: 168 mbps > > > > bandwidth=20 > > throughput: 168 mbps > > > > bandwidth=21 > > throughput: 336 mbps > > > > bandwidth=22 > > throughput: 336 mbps > > > > bandwidth=25 > > throughput: 335.87 mbps > > > > bandwidth=30 > > throughput: 335 mbps > > > > bandwidth=35 > > throughput: 335 mbps > > > > bandwidth=40 > > throughput: 335 mbps > > > > bandwidth=45 > > throughput: 504.00 mbps > > > > bandwidth=50 > > throughput: 500.00 mbps > > > > bandwidth=55 > > throughput: 500.00 mbps > > > > bandwidth=60 > > throughput: 500.00 mbps > > > > bandwidth=65 > > throughput: 650.00 mbps > > > > bandwidth=70 > > throughput: 660.00 mbps > > OK, so migrate-setspeed takes a bandwidth in MBytes/sec and I guess you're > throughput is in MBit/sec - so at the higher end it's about right, and at the lower > end it's way off. > > Let me think about a fix for this. > > What are you using to measure throughput? I use 'watch' command to observe the output of qemu hmp command 'info migrate', calculate the average value of throughput field during stage 2 of live migration. Thanks for taking time to dig into this issue, Lin ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] 回复: 回复: 答复: migrate_set_speed has no effect if the guest is using hugepages. 2019-07-15 9:43 ` [Qemu-devel] 回复: " Lin Ma @ 2019-08-02 10:57 ` Lin Ma 2019-12-05 10:31 ` 回复: 答复: [Qemu-devel] " Dr. David Alan Gilbert 1 sibling, 0 replies; 7+ messages in thread From: Lin Ma @ 2019-08-02 10:57 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: qemu-devel Hi Dave, May I ask that do you have any update about the fix? Thanks, Lin > -----邮件原件----- > 发件人: Qemu-devel <qemu-devel-bounces+lma=suse.com@nongnu.org> 代 > 表 Lin Ma > 发送时间: 2019年7月15日 17:43 > 收件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > 抄送: qemu-devel@nongnu.org > 主题: [Qemu-devel] 回复: 答复: migrate_set_speed has no effect if the guest > is using hugepages. > > > > > -----邮件原件----- > > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > > 发送时间: 2019年7月12日 20:34 > > 收件人: Lin Ma <LMa@suse.com> > > 抄送: qemu-devel@nongnu.org > > 主题: Re: 答复: [Qemu-devel] migrate_set_speed has no effect if the guest > > is using hugepages. > > > > * Lin Ma (LMa@suse.com) wrote: > > > > > > > > > > -----邮件原件----- > > > > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > 发送时间: 2019年7月11日 18:24 > > > > 收件人: Lin Ma <LMa@suse.com> > > > > 抄送: qemu-devel@nongnu.org > > > > 主题: Re: [Qemu-devel] migrate_set_speed has no effect if the guest > > > > is using hugepages. > > > > > > > > * Lin Ma (LMa@suse.com) wrote: > > > > > Hi all, > > > > > > > > Hi Lin, > > > > > > Hi Dave, > > > > > > > > > When I live migrate a qemu/kvm guest, If the guest is using huge > > > > > pages, I found that the migrate_set_speed command had no effect > > > > > during > > > > stage 2. > > > > > > > > Can you explain what you mean by 'stage 2'? > > > We know that the live migration contains 3 stages: > > > Stage 1: Mark all of RAM dirty. > > > Stage 2: Keep sending dirty RAM pages since last iteration Stage 3: > > > Stop guest, transfer remaining dirty RAM, device state (Please refer > > > to > > > https://developers.redhat.com/blog/2015/03/24/live-migrating-qemu-kv > > > m- virtual-machines/#live-migration for further details) > > > > OK, yeh the numbering is pretty arbitrary so it's not something I > > normally think about like that. > > > > > > > > > > It was caused by commit 4c011c3 postcopy: Send whole huge pages > > > > > > > > > > I'm wondering that is it by design or is it a bug waiting for fix? > > > > > > > > This is the first report I've seen for it. How did you conclude > > > > that > > > > 4c011c3 caused it? While I can see it might have some effect on > > > > the bandwidth management, I'm surprised it has this much effect. > > > > > > While digging into the bandwidth issue, Git bisect shows that this > > > commit was > > the first bad commit. > > > > OK. > > > > > > What size huge pages are you using - 2MB or 1GB? > > > > > > When I hit this issue I was using 1GB huge page size. > > > I tested this issue with 2MB page size today On Gigabit LAN, > > > Although the bandwidth control looks a little better than using 1GB, But not > too much. > > Please refer to the below test result. > > > > OK, I can certainly see why this might happen with 1GB huge pages; I > > need to have a think about a fix. > > > > > > I can imagine we might have a problem that since we only do the > > > > sleep between the hugepages, if we were using 1GB hugepages then > > > > we'd see <big chunk of > > > > data>[sleep]<big chunk of data>[sleep] which isn't as smooth as it > > > > data>used to > > be. > > > > > > > > Can you give me some more details of your test? > > > > > > Live migration bandwidth management testing with 2MB hugepage size: > > > sles12sp4_i440fx is a qemu/kvm guest with 6GB memory size. > > > Note: the throughput value is approximating value. > > > > > > Terminal 1: > > > virsh migrate-setspeed sles12sp4_i440fx $bandwidth && virsh migrate > > > --live sles12sp4_i440fx qemu+tcp://5810f/system > > > > > > Terminal 2: > > > virsh qemu-monitor-command sles12sp4_i440fx --hmp "info migrate" > > > > > > bandwidth=5 > > > throughput: 160 mbps > > > > > > bandwidth=10 > > > throughput: 167 mbps > > > > > > bandwidth=15 > > > throughput: 168 mbps > > > > > > bandwidth=20 > > > throughput: 168 mbps > > > > > > bandwidth=21 > > > throughput: 336 mbps > > > > > > bandwidth=22 > > > throughput: 336 mbps > > > > > > bandwidth=25 > > > throughput: 335.87 mbps > > > > > > bandwidth=30 > > > throughput: 335 mbps > > > > > > bandwidth=35 > > > throughput: 335 mbps > > > > > > bandwidth=40 > > > throughput: 335 mbps > > > > > > bandwidth=45 > > > throughput: 504.00 mbps > > > > > > bandwidth=50 > > > throughput: 500.00 mbps > > > > > > bandwidth=55 > > > throughput: 500.00 mbps > > > > > > bandwidth=60 > > > throughput: 500.00 mbps > > > > > > bandwidth=65 > > > throughput: 650.00 mbps > > > > > > bandwidth=70 > > > throughput: 660.00 mbps > > > > OK, so migrate-setspeed takes a bandwidth in MBytes/sec and I guess > > you're throughput is in MBit/sec - so at the higher end it's about > > right, and at the lower end it's way off. > > > > Let me think about a fix for this. > > > > What are you using to measure throughput? > > I use 'watch' command to observe the output of qemu hmp command 'info > migrate', calculate the average value of throughput field during stage 2 of live > migration. > > Thanks for taking time to dig into this issue, Lin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 回复: 答复: [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages. 2019-07-15 9:43 ` [Qemu-devel] 回复: " Lin Ma 2019-08-02 10:57 ` [Qemu-devel] 回复: " Lin Ma @ 2019-12-05 10:31 ` Dr. David Alan Gilbert [not found] ` <BY5PR18MB331333470C356200DDF0A8DAC55B0@BY5PR18MB3313.namprd18.prod.outlook.com> 1 sibling, 1 reply; 7+ messages in thread From: Dr. David Alan Gilbert @ 2019-12-05 10:31 UTC (permalink / raw) To: Lin Ma; +Cc: qemu-devel Hi Lin, I've just posted 'migration: Rate limit inside host pages'; please let me know if it helps for you. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <BY5PR18MB331333470C356200DDF0A8DAC55B0@BY5PR18MB3313.namprd18.prod.outlook.com>]
* Re: 回复: 回复: 答复: [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages. [not found] ` <BY5PR18MB331333470C356200DDF0A8DAC55B0@BY5PR18MB3313.namprd18.prod.outlook.com> @ 2019-12-10 14:01 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 7+ messages in thread From: Dr. David Alan Gilbert @ 2019-12-10 14:01 UTC (permalink / raw) To: Lin Ma; +Cc: qemu-devel * Lin Ma (LMa@suse.com) wrote: > Hi Dave, > > The patch fixed the issue, The rate limit with hugepages works well. > Thanks for your help! No problem; thank you for reporting and testing it. Dave > Lin > > > -----邮件原件----- > > 发件人: Dr. David Alan Gilbert <dgilbert@redhat.com> > > 发送时间: 2019年12月5日 18:32 > > 收件人: Lin Ma <LMa@suse.com> > > 抄送: qemu-devel@nongnu.org > > 主题: Re: 回复: 答复: [Qemu-devel] migrate_set_speed has no effect if the > > guest is using hugepages. > > > > Hi Lin, > > I've just posted 'migration: Rate limit inside host pages'; please let me know if > > it helps for you. > > > > Dave > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-12-10 14:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-11 9:44 [Qemu-devel] migrate_set_speed has no effect if the guest is using hugepages Lin Ma 2019-07-11 10:24 ` Dr. David Alan Gilbert [not found] ` <BY5PR18MB331347C441DA068E32BFDE53C5F20@BY5PR18MB3313.namprd18.prod.outlook.com> 2019-07-12 12:34 ` [Qemu-devel] 答复: " Dr. David Alan Gilbert 2019-07-15 9:43 ` [Qemu-devel] 回复: " Lin Ma 2019-08-02 10:57 ` [Qemu-devel] 回复: " Lin Ma 2019-12-05 10:31 ` 回复: 答复: [Qemu-devel] " Dr. David Alan Gilbert [not found] ` <BY5PR18MB331333470C356200DDF0A8DAC55B0@BY5PR18MB3313.namprd18.prod.outlook.com> 2019-12-10 14:01 ` 回复: " Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).