From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Question about high CPU load during iperf ethernet testing Date: Wed, 24 Sep 2014 18:57:02 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Iurii Konovalenko Cc: xen-devel@lists.xen.org, Ian Campbell , Julien Grall , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Wed, 24 Sep 2014, Iurii Konovalenko wrote: > Hi, Stefano! > Thank you for your reply! > > On Tue, Sep 23, 2014 at 7:41 PM, Stefano Stabellini > wrote: > > On Mon, 22 Sep 2014, Iurii Konovalenko wrote: > >> Hello, all! > >> > >> I am running iperf ethernet tests on DRA7XX_EVM board (OMAP5). > >> Xen version is 4.4. > >> I run only Linux (kernel 3.8) as Dom0, no other active domains (For clear tests results I decided not to start DomU). > >> iperf server is started on host, iperf client is started on board with command line "iperf -c 192.168.2.10 -w 256k -m > >> -f M -d -t 60". > > > > Just to double check: you are running the iperf test in Dom0, correct? > > Yes, iperf is running in Dom0. > > >> During test I studied CPU load with top tool on Dom0, and saw, that one VCPU is totally loaded, spending about 50% in > >> software IRQs, and 50% in system. > >> Running the same test on clear Linux without Xen, I saw that CPU load is about 2-4%. > >> > >> I decided to debug a bit, so I used "({register uint64_t _r; asm volatile("mrrc " "p15, 0, %0, %H0, c14" ";" : "=r" > >> (_r)); _r; })" command to read timer counter before and after operations I want to test. > >> > >> In such way I've found, that most time of CPU is spent in functions enable_irq/disable_irq_nosync and > >> spin_lock_irqsave/spin_unlock_irqrestore (mostly in "mrs %0, cpsr @ arch_local_irq_save"/"msr cpsr_c, %0 @ > >> local_irq_restore"). When running without Xen it should not take so much time. > > > > There is nothing Xen specific in the Linux ARM implementation of > > spin_lock_irqsave/spin_unlock_irqrestore and > > enable_irq/disable_irq_nosync. > > > > That is strange, because my explorations show a lot of time is spent > there, for example in spin_unlock_irqrestore (mostly in mrs > instuction) about 20%, when running in Dom0. Unless you are doing something wrong in your measurements, if you really narrowed it down to one instruction then I would try to do the same on a different SoC of another vendor to see if it is actually an hardware issue. > >> So, could anyone explain me some questions: > >> 1. Is it normal behaviour? > > > > No, it is not normal. > > Assuming that you assign all the memory to Dom0 and as many vcpu as > > physical cpus on your platform then you should get the same numbers as > > native. > > OK, so I might do something wrong. > > >> 2. Does hypervisor trap cpsr register? I suppose, that hypervisor trap access to cpsr register, that leads to > >> additional overhead, but I can't find place in sources where it happens. > > > > We don't trap cpsr. > > It is strange, because it was only one my assumption, where time can be spent. > So could you please advise where to go to understand the reason of > such high VCPU load? I don't know. When you say that arch_local_irq_save is the one taking all the time, do you actually have something like: time1 = read CNTVCT; arch_local_irq_save(); time2 = read CNTVCT; printk(time2-time1); in your code? > Best regards. > > Iurii Konovalenko | Senior Software Engineer > GlobalLogic >