From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lin, Ray" Subject: RE: swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough Date: Thu, 18 Nov 2010 14:39:40 -0700 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Dan Magenheimer , Dante Cinco , Konrad Wilk Cc: Jeremy Fitzhardinge , Xen-devel , "mathieu.desnoyers@polymtl.ca" , Andrew Thomas , "keir.fraser@eu.citrix.com" , Chris Mason List-Id: xen-devel@lists.xenproject.org =20 -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists= .xensource.com] On Behalf Of Dan Magenheimer Sent: Thursday, November 18, 2010 1:21 PM To: Dante Cinco; Konrad Wilk Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@polymtl.ca; Andrew Th= omas; keir.fraser@eu.citrix.com; Chris Mason Subject: RE: [Xen-devel] swiotlb=3Dforce in Konrad's xen-pcifront-0.8.2 pvo= ps domU kernel with PCI passthrough In case it is related: http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01247.html=20 Although I never went further on this investigation, it appeared to me that= pvclock_clocksource_read was getting called at least an order-of-magnitude= more frequently than expected in some circumstances for some kernels. And= IIRC it was scaled by the number of vcpus. We did suspect it, since our old setting was HZ=3D1000 and we assigned more= than 10 VCPUs to domU. But we don't see the performance difference with HZ= =3D100. > -----Original Message----- > From: Dante Cinco [mailto:dantecinco@gmail.com] > Sent: Thursday, November 18, 2010 12:36 PM > To: Konrad Rzeszutek Wilk > Cc: Jeremy Fitzhardinge; Xen-devel; mathieu.desnoyers@polymtl.ca;=20 > Andrew Thomas; keir.fraser@eu.citrix.com; Chris Mason > Subject: Re: [Xen-devel] swiotlb=3Dforce in Konrad's xen-pcifront-0.8.2=20 > pvops domU kernel with PCI passthrough >=20 > I mentioned earlier in an previous post to this thread that I'm able=20 > to apply Dulloor's xenoprofile patch to the dom0 kernel but not the=20 > domU kernel. So I can't do active-domain profiling but I'm able to do=20 > passive-domain profiling but I don't know how reliable the results are=20 > since it shows pvclock_clocksource_read as the top consumer of CPU=20 > cycles at 28%. >=20 > CPU: Intel Architectural Perfmon, speed 2665.98 MHz (estimated)=20 > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a=20 > unit mask of 0x00 (No unit mask) count 100000 > samples % image name app name > symbol name > 918089 27.9310 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel pvclock_clocksource_read > 217811 6.6265 domain1-modules domain1-modules > /domain1-modules > 188327 5.7295 vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug > mutex_spin_on_owner > 186684 5.6795 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __xen_spin_lock > 149514 4.5487 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __write_lock_failed > 123278 3.7505 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __kernel_text_address > 122906 3.7392 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel xen_spin_unlock > 90903 2.7655 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __spin_time_accum > 85880 2.6127 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __module_address > 75223 2.2885 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel print_context_stack > 66778 2.0316 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel __module_text_address > 57389 1.7459 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel is_module_text_address > 47282 1.4385 xen-syms-4.1-unstable domain1-xen > syscall_enter > 47219 1.4365 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel prio_tree_insert > 46495 1.4145 vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug > vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug > pvclock_clocksource_read > 44501 1.3539 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel prio_tree_left > 32482 0.9882 > vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu-5.11.dcinco-debug > domain1-kernel native_read_tsc >=20 > I ran oprofile (0.9.5 with xenoprofile patch) for 20 seconds while the=20 > I/Os were running. Here's the command I used: >=20 > opcontrol --start --xen=3D/boot/xen-syms-4.1-unstable=20 > --vmlinux=3D/boot/vmlinux-2.6.32.25-pvops-stable-dom0-5.7.dcinco-debug > --passive-domains=3D1 > --passive-images=3D/boot/vmlinux-2.6.36-rc7-pvops-kpcif-08-2-domu- > 5.11.dcinco-debug >=20 > I had to remove dom0_max_vcpus=3D1 (but kept dom0_vcpus_pin=3Dtrue) in th= e=20 > Xen command line. Otherwise, oprofile only gives the samples from=20 > CPU0. >=20 > I'm going to try perf next. >=20 > - Dante >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel