All of lore.kernel.org
 help / color / mirror / Atom feed
* Load increase after memory upgrade (part2)
@ 2011-11-24 12:28 Carsten Schiers
  2011-11-25 18:42 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-11-24 12:28 UTC (permalink / raw)
  To: konrad.wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1878 bytes --]

Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.

 
We (now) speak about

 
*	Xen 4.1.2
*	Dom0 is Jeremy's 2.6.32.46 64 bit
*	DomU in question is now 3.1.2 64 bit
*	Same thing if DomU is also 2.6.32.46
*	DomU owns two PCI cards (DVB-C) that o DMA
*	Machine has 8GB, Dom0 pinned at 512MB

 
As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It

will be "close to normal" if I reduce the memory used to 4GB.

 
As you can see from the attachment, you once had an idea. So should we try to find something...?

 
Carsten.
 
-----Ursprüngliche Nachricht-----
An:konrad.wilk <konrad.wilk@oracle.com>; 
CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.06.2011 23:17
Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> Lets first do the c) experiment as that will likely explain your load average increase.
...
> >c). If you want to see if the fault here lies in the bounce buffer 
> being used more
> >often in the DomU b/c you have 8GB of memory now and you end up using 
> more pages
> >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> easier way is
> >to just do (on the Xen hypervisor line): mem=4G and that will make 
> think you only have
> >4GB of physical RAM.  If the load comes back to the normal "amount" 
> then the likely
> >culprit is that and we can think on how to fix this.

You are on the right track. Load was going down to "normal" 10% when reducing
Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
before.

[-- Attachment #1.2: Type: text/html, Size: 3084 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-24 12:28 Load increase after memory upgrade (part2) Carsten Schiers
@ 2011-11-25 18:42 ` Konrad Rzeszutek Wilk
  2011-11-25 22:11   ` Carsten Schiers
  2011-11-26  9:14   ` Carsten Schiers
  0 siblings, 2 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-25 18:42 UTC (permalink / raw)
  To: Carsten Schiers; +Cc: xen-devel, konrad.wilk

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy's 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
> 
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    'mem=XX' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?



> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer 
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-25 18:42 ` Konrad Rzeszutek Wilk
@ 2011-11-25 22:11   ` Carsten Schiers
  2011-11-28 15:28     ` Konrad Rzeszutek Wilk
  2011-11-26  9:14   ` Carsten Schiers
  1 sibling, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-11-25 22:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, konrad.wilk

I got the values in DomU. I will have

  - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
  - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
  - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

I looked through my old mails from you and you explained already the necessity of double
bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
Xenified kernel not have this kind of issue?

The driver in question is nearly identical between the two kernel versions. It is in
Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in 
question is:

        /* allocate and init buffers */
        av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);
        if (!av7110->debi_virt)
                goto err_saa71466_vfree_4;

isn't it? I think the cards are constantly transferring the stream received through DMA. 

I have set dom0_mem=512M by the way, shall I change that in some way?

I can try out some things, if you want me to. But I have no idea what to do and where to
start, so I rely on your help...

Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: xen-devel; konrad.wilk
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy's 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
> 
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    'mem=XX' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?



> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer 
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-25 18:42 ` Konrad Rzeszutek Wilk
  2011-11-25 22:11   ` Carsten Schiers
@ 2011-11-26  9:14   ` Carsten Schiers
  2011-11-28 15:30     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-11-26  9:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, konrad.wilk

To add (read from some munin statistics I made over the time):

  - with load I mean the %CPU of xentop
  - there is no change in CPU usage of the DomU or Dom0
  - xenpm shows the core dedicated to that DomU is doing more work

Also I need to say that reduction to 4GB was performed by Xen parameter.

Carsten.


-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 25. November 2011 19:43
An: Carsten Schiers
Cc: konrad.wilk; xen-devel
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> 
> ??
> We (now) speak about
> 
> ??
> *	Xen 4.1.2
> *	Dom0 is Jeremy's 2.6.32.46 64 bit
> *	DomU in question is now 3.1.2 64 bit
> *	Same thing if DomU is also 2.6.32.46
> *	DomU owns two PCI cards (DVB-C) that o DMA
> *	Machine has 8GB, Dom0 pinned at 512MB
> 
> ??
> As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> 
> will be "close to normal" if I reduce the memory used to 4GB.

That is in the dom0 or just in general on the machine?
> 
> ??
> As you can see from the attachment, you once had an idea. So should we try to find something...?

I think that was to instrument swiotlb to give an idea of how
often it is called and basically have a matrix of its load. And
from there figure out if the issue is that:

 1). The drivers allocoate/bounce/deallocate buffers on every interrupt
    (bad, driver should be using some form of dma pool and most of the
    ivtv do that)

 2). The buffers allocated to the drivers are above the 4GB and we end
    up bouncing it needlessly. That can happen if the dom0 has most of
    the precious memory under 4GB. However, that is usually not the case
    as the domain isusually allocated from the top of the memory. The
    fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
    before 3.1, the parameter would be ignored, so you had to use
    'mem=XX' on the Linux command line as well.

 3). Where did you get the load values? Was it dom0? or domU?



> 
> ??
> Carsten.
> ??
> -----Urspr??ngliche Nachricht-----
> An:konrad.wilk <konrad.wilk@oracle.com>; 
> CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> Von:Carsten Schiers <carsten@schiers.de>
> Gesendet:Mi 29.06.2011 23:17
> Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > Lets first do the c) experiment as that will likely explain your load average increase.
> ...
> > >c). If you want to see if the fault here lies in the bounce buffer 
> > being used more
> > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > more pages
> > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > easier way is
> > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > think you only have
> > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > then the likely
> > >culprit is that and we can think on how to fix this.
> 
> You are on the right track. Load was going down to "normal" 10% when reducing
> Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> before.

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-25 22:11   ` Carsten Schiers
@ 2011-11-28 15:28     ` Konrad Rzeszutek Wilk
  2011-11-28 15:40       ` Ian Campbell
  2011-11-28 15:52       ` Carsten Schiers
  0 siblings, 2 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-28 15:28 UTC (permalink / raw)
  To: Carsten Schiers, zhenzhong.duan, lersek; +Cc: xen-devel, konrad.wilk

On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> I got the values in DomU. I will have
> 
>   - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
>   - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
>   - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.

> 
> I looked through my old mails from you and you explained already the necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> Xenified kernel not have this kind of issue?

That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.
> 
> The driver in question is nearly identical between the two kernel versions. It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in 
> question is:
> 
>         /* allocate and init buffers */
>         av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);

Good. So it allocates it during init and uses it.
>         if (!av7110->debi_virt)
>                 goto err_saa71466_vfree_4;
> 
> isn't it? I think the cards are constantly transferring the stream received through DMA. 

Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).
> 
> I have set dom0_mem=512M by the way, shall I change that in some way?

Does the reporting (CPU usage of DomU) change in any way with that?
> 
> I can try out some things, if you want me to. But I have no idea what to do and where to
> start, so I rely on your help...
> 
> Carsten.
> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *	Xen 4.1.2
> > *	Dom0 is Jeremy's 2.6.32.46 64 bit
> > *	DomU in question is now 3.1.2 64 bit
> > *	Same thing if DomU is also 2.6.32.46
> > *	DomU owns two PCI cards (DVB-C) that o DMA
> > *	Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to 4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     'mem=XX' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer 
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-26  9:14   ` Carsten Schiers
@ 2011-11-28 15:30     ` Konrad Rzeszutek Wilk
  2011-11-29  9:42       ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-28 15:30 UTC (permalink / raw)
  To: Carsten Schiers; +Cc: xen-devel, konrad.wilk

On Sat, Nov 26, 2011 at 10:14:08AM +0100, Carsten Schiers wrote:
> To add (read from some munin statistics I made over the time):
> 
>   - with load I mean the %CPU of xentop
>   - there is no change in CPU usage of the DomU or Dom0

Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?

>   - xenpm shows the core dedicated to that DomU is doing more work
> 
> Also I need to say that reduction to 4GB was performed by Xen parameter.
> 
> Carsten.
> 
> 
> -----Urspr?ngliche Nachricht-----
> Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: konrad.wilk; xen-devel
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *	Xen 4.1.2
> > *	Dom0 is Jeremy's 2.6.32.46 64 bit
> > *	DomU in question is now 3.1.2 64 bit
> > *	Same thing if DomU is also 2.6.32.46
> > *	DomU owns two PCI cards (DVB-C) that o DMA
> > *	Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to 4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     'mem=XX' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer 
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:28     ` Konrad Rzeszutek Wilk
@ 2011-11-28 15:40       ` Ian Campbell
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
                           ` (2 more replies)
  2011-11-28 15:52       ` Carsten Schiers
  1 sibling, 3 replies; 66+ messages in thread
From: Ian Campbell @ 2011-11-28 15:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: lersek, xen-devel, Carsten Schiers, zhenzhong.duan, konrad.wilk

On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:

> > I looked through my old mails from you and you explained already the necessity of double
> > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > Xenified kernel not have this kind of issue?
> 
> That is a puzzle. It should not. The code is very much the same - both
> use the generic SWIOTLB which has not changed for years.

The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn't exactly the same as the stuff in
mainline Linux, it's been heavily refactored for one thing. It's not
impossible that mainline is bouncing something it doesn't really need
to.

It's also possible that the dma mask of the device is different/wrong in
mainline leading to such additional bouncing.

I guess it's also possible that the classic-Xen kernels are playing fast
and loose by not bouncing something they should (although if so they
appear to be getting away with it...) or that there is some difference
which really means mainline needs to bounce while classic-Xen doesn't.

Ian.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:28     ` Konrad Rzeszutek Wilk
  2011-11-28 15:40       ` Ian Campbell
@ 2011-11-28 15:52       ` Carsten Schiers
  1 sibling, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-11-28 15:52 UTC (permalink / raw)
  To: lersek, zhenzhong.duan, Konrad Rzeszutek Wilk; +Cc: xen-devel, konrad.wilk


[-- Attachment #1.1: Type: text/plain, Size: 7514 bytes --]

Hi,

 
let me try to explain a bit more. Here you see the output of my xentop munin graph for a

week. Only take a look at the bluish buckle. Notice the small step in front? So it's the CPU

permille used by the DomU that owns the cards. The small buckle is when I only put in 

one PCI card. Afterwards it's constantly noticable higher load. See that Dom0 (green) is 

not impacted. I am back to the Xenified kernel, as you can see.

 

 
In the next picture you see the output of xenpm visualized. So this might be an indicator that

realy something happens. It's only the core that I dedicated to that DomU. I have a three-core

AMD CPU by the way:

 

 
In CPU usage of the Dom0, there is nothing to see:

 

 
In CPU usage of the DomU, there is also not much to see, eventually a very slight change of

mix:

 

 
There is a slight increase in sleaping jobs at the time slot in question, I guess nothing we ca

directly map to the issue:

 

 
If you need other charts, I can try to produce them. 

 
BR,
Carsten.

 
-----Ursprüngliche Nachricht-----
An:Carsten Schiers <carsten@schiers.de>; zhenzhong.duan@oracle.com; lersek@redhat.com; 
CC:xen-devel <xen-devel@lists.xensource.com>; konrad.wilk <konrad.wilk@oracle.com>; 
Von:Konrad Rzeszutek Wilk <konrad@darnok.org>
Gesendet:Mo 28.11.2011 16:33
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> I got the values in DomU. I will have
> 
>   - aprox. 5% load in DomU with 2.6.34 Xenified Kernel
>   - aprox. 15% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with one card attached
>   - aprox. 30% load in DomU with 2.6.32.46 Jeremy or 3.1.2 Kernel with two cards attached

HA!

I just wonder if the issue is that the reporting of CPU spent is wrong.
Laszlo Ersek and Zhenzhong Duan have both reported a bug in the pvops
code when it came to account of CPU time.

> 
> I looked through my old mails from you and you explained already the necessity of double
> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> Xenified kernel not have this kind of issue?

That is a puzzle. It should not. The code is very much the same - both
use the generic SWIOTLB which has not changed for years.
> 
> The driver in question is nearly identical between the two kernel versions. It is in
> Drivers/media/dvb/ttpci by the way and when I understood the code right, the allo in 
> question is:
> 
>         /* allocate and init buffers */
>         av7110->debi_virt = pci_alloc_consistent(pdev, 8192, &av7110->debi_bus);

Good. So it allocates it during init and uses it.
>         if (!av7110->debi_virt)
>                 goto err_saa71466_vfree_4;
> 
> isn't it? I think the cards are constantly transferring the stream received through DMA. 

Yeah, and that memory is set aside for the life of the driver. So there
should be no bounce buffering happening (as it allocated the memory
below the 4GB mark).
> 
> I have set dom0_mem=512M by the way, shall I change that in some way?

Does the reporting (CPU usage of DomU) change in any way with that?
> 
> I can try out some things, if you want me to. But I have no idea what to do and where to
> start, so I rely on your help...
> 
> Carsten.
> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 25. November 2011 19:43
> An: Carsten Schiers
> Cc: xen-devel; konrad.wilk
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Thu, Nov 24, 2011 at 01:28:44PM +0100, Carsten Schiers wrote:
> > Hello again, I would like to come back to that thing...sorry that I did not have the time up to now.
> > 
> > ??
> > We (now) speak about
> > 
> > ??
> > *Xen 4.1.2
> > *Dom0 is Jeremy's 2.6.32.46 64 bit
> > *DomU in question is now 3.1.2 64 bit
> > *Same thing if DomU is also 2.6.32.46
> > *DomU owns two PCI cards (DVB-C) that o DMA
> > *Machine has 8GB, Dom0 pinned at 512MB
> > 
> > ??
> > As compared to 2.6.34 Kernel with backported patches, the load on the DomU is at least twice as high. It
> > 
> > will be "close to normal" if I reduce the memory used to 4GB.
> 
> That is in the dom0 or just in general on the machine?
> > 
> > ??
> > As you can see from the attachment, you once had an idea. So should we try to find something...?
> 
> I think that was to instrument swiotlb to give an idea of how
> often it is called and basically have a matrix of its load. And
> from there figure out if the issue is that:
> 
>  1). The drivers allocoate/bounce/deallocate buffers on every interrupt
>     (bad, driver should be using some form of dma pool and most of the
>     ivtv do that)
> 
>  2). The buffers allocated to the drivers are above the 4GB and we end
>     up bouncing it needlessly. That can happen if the dom0 has most of
>     the precious memory under 4GB. However, that is usually not the case
>     as the domain isusually allocated from the top of the memory. The
>     fix for that was to set dom0_mem=max:XX. .. but with Dom0 kernels
>     before 3.1, the parameter would be ignored, so you had to use
>     'mem=XX' on the Linux command line as well.
> 
>  3). Where did you get the load values? Was it dom0? or domU?
> 
> 
> 
> > 
> > ??
> > Carsten.
> > ??
> > -----Urspr??ngliche Nachricht-----
> > An:konrad.wilk <konrad.wilk@oracle.com>; 
> > CC:linux <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; 
> > Von:Carsten Schiers <carsten@schiers.de>
> > Gesendet:Mi 29.06.2011 23:17
> > Betreff:AW: Re: Re: Re: AW: Re: [Xen-devel] AW: Load increase after memory upgrade?
> > > Lets first do the c) experiment as that will likely explain your load average increase.
> > ...
> > > >c). If you want to see if the fault here lies in the bounce buffer 
> > > being used more
> > > >often in the DomU b/c you have 8GB of memory now and you end up using 
> > > more pages
> > > >past 4GB (in DomU), I can cook up a patch to figure this out. But an 
> > > easier way is
> > > >to just do (on the Xen hypervisor line): mem=4G and that will make 
> > > think you only have
> > > >4GB of physical RAM. ??If the load comes back to the normal "amount" 
> > > then the likely
> > > >culprit is that and we can think on how to fix this.
> > 
> > You are on the right track. Load was going down to "normal" 10% when reducing
> > Xen to 4GB by the parameter. Load seems to be still a little, little bit lower
> > with Xenified Kernel (8-9%), but this is drastically lower than the 20% we had
> > before.
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 119734 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:40       ` Ian Campbell
@ 2011-11-28 16:45         ` Konrad Rzeszutek Wilk
  2011-11-29  8:31           ` Jan Beulich
                             ` (2 more replies)
  2011-11-28 16:58         ` Laszlo Ersek
  2011-11-29  9:37         ` Carsten Schiers
  2 siblings, 3 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-28 16:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers,
	zhenzhong.duan, lersek

On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> 
> > > I looked through my old mails from you and you explained already the necessity of double
> > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > Xenified kernel not have this kind of issue?
> > 
> > That is a puzzle. It should not. The code is very much the same - both
> > use the generic SWIOTLB which has not changed for years.
> 
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn't exactly the same as the stuff in
> mainline Linux, it's been heavily refactored for one thing. It's not
> impossible that mainline is bouncing something it doesn't really need
> to.

The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
being done. The alloc_coherent will allocate a nice page, underneath the 4GB
mark and give it to the driver. The driver can use it as it wishes and there
is no need to bounce buffer.

But I can't find the implementation of that in the classic Xen-SWIOTLB. It looks
as if it is using map_single which would be taking the memory out of the
pool for a very long time, instead of allocating memory and "swizzling" the MFNs.
[Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
improved much better so let me check that]

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..

> 
> It's also possible that the dma mask of the device is different/wrong in
> mainline leading to such additional bouncing.

If one were to use map_page and such - yes. But the alloc_coherent bypasses
that and ends up allocating it right under the 4GB (or rather it allocates
based on the dev->coherent_mask and swizzles the MFNs as required).

> 
> I guess it's also possible that the classic-Xen kernels are playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen doesn't.

<nods> Could be very well.
> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:40       ` Ian Campbell
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
@ 2011-11-28 16:58         ` Laszlo Ersek
  2011-11-29  9:37         ` Carsten Schiers
  2 siblings, 0 replies; 66+ messages in thread
From: Laszlo Ersek @ 2011-11-28 16:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers,
	zhenzhong.duan, konrad.wilk

On 11/28/11 16:40, Ian Campbell wrote:
> On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
>> On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
>
>>> I looked through my old mails from you and you explained already the necessity of double
>>> bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
>>> Xenified kernel not have this kind of issue?
>>
>> That is a puzzle. It should not. The code is very much the same - both
>> use the generic SWIOTLB which has not changed for years.
>
> The swiotlb-xen used by classic-xen kernels (which I assume is what
> Carsten means by "Xenified") isn't exactly the same as the stuff in
> mainline Linux, it's been heavily refactored for one thing. It's not
> impossible that mainline is bouncing something it doesn't really need
> to.

Please excuse me if I'm completely mistaken; my only point of reference 
is that we recently had to backport 
<http://xenbits.xensource.com/hg/linux-2.6.18-xen.hg/rev/940>.

> It's also possible that the dma mask of the device is different/wrong in
> mainline leading to such additional bouncing.

dma_alloc_coherent() -- which I guess is the precursor of 
pci_alloc_consistent() -- asks xen_create_contiguous_region() to back 
the vaddr range with frames machine-addressible inside the device's dma 
mask. xen_create_contiguous_region() seems to land in a XENMEM_exchange 
hypercall (among others). Perhaps this extra layer of indirection allows 
the driver to use low pages directly, without bounce buffers.

> I guess it's also possible that the classic-Xen kernels are playing fast
> and loose by not bouncing something they should (although if so they
> appear to be getting away with it...) or that there is some difference
> which really means mainline needs to bounce while classic-Xen doesn't.

I'm sorry if what I just posted is painfully stupid. I'm taking the risk 
for the 1% chance that it could be helpful.

Wrt. the idle time accounting problem, after Niall's two pings, I'm also 
waiting for a verdict, and/or for myself finding the time and fishing 
out the current patches.

Laszlo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
@ 2011-11-29  8:31           ` Jan Beulich
  2011-11-29  9:31             ` Carsten Schiers
  2011-11-29  9:46           ` Carsten Schiers
  2011-11-29 10:23           ` Ian Campbell
  2 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2011-11-29  8:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Ian Campbell, lersek, zhenzhong.duan,
	Konrad Rzeszutek Wilk, Carsten Schiers

>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> But I can't find the implementation of that in the classic Xen-SWIOTLB.

linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-29  8:31           ` Jan Beulich
@ 2011-11-29  9:31             ` Carsten Schiers
  0 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-11-29  9:31 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, lersek, zhenzhong.duan, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 767 bytes --]

I attached the actualy used 2.6.34 file here, if that helps. BR,C.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Ian Campbell <Ian.Campbell@citrix.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; zhenzhong.duan@oracle.com; lersek@redhat.com; Carsten Schiers <carsten@schiers.de>; 
Von:Jan Beulich <JBeulich@suse.com>
Gesendet:Di 29.11.2011 09:52
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
>>> On 28.11.11 at 17:45, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> But I can't find the implementation of that in the classic Xen-SWIOTLB.

linux-2.6.18-xen.hg/arch/i386/kernel/pci-dma-xen.c:dma_alloc_coherent().

Jan


[-- Attachment #1.2: Type: text/html, Size: 1823 bytes --]

[-- Attachment #2: pci-dma-xen.c --]
[-- Type: application/octet-stream, Size: 9609 bytes --]

#include <linux/dma-mapping.h>
#include <linux/dma-debug.h>
#include <linux/dmar.h>
#include <linux/bootmem.h>
#include <linux/gfp.h>
#include <linux/pci.h>
#include <linux/kmemleak.h>

#include <asm/proto.h>
#include <asm/dma.h>
#include <asm/iommu.h>
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/amd_iommu.h>
#include <asm/x86_init.h>

static int forbid_dac __read_mostly;

struct dma_map_ops *dma_ops = &nommu_dma_ops;
EXPORT_SYMBOL(dma_ops);

static int iommu_sac_force __read_mostly;

#ifdef CONFIG_IOMMU_DEBUG
int panic_on_overflow __read_mostly = 1;
int force_iommu __read_mostly = 1;
#else
int panic_on_overflow __read_mostly = 0;
int force_iommu __read_mostly = 0;
#endif

int iommu_merge __read_mostly = 0;

int no_iommu __read_mostly;
/* Set this to 1 if there is a HW IOMMU in the system */
int iommu_detected __read_mostly = 0;

/*
 * This variable becomes 1 if iommu=pt is passed on the kernel command line.
 * If this variable is 1, IOMMU implementations do no DMA translation for
 * devices and allow every device to access to whole physical memory. This is
 * useful if a user wants to use an IOMMU only for KVM device assignment to
 * guests and not for driver dma translation.
 */
int iommu_pass_through __read_mostly;

/* Dummy device used for NULL arguments (normally ISA). */
struct device x86_dma_fallback_dev = {
	.init_name = "fallback device",
	.coherent_dma_mask = ISA_DMA_BIT_MASK,
	.dma_mask = &x86_dma_fallback_dev.coherent_dma_mask,
};
EXPORT_SYMBOL(x86_dma_fallback_dev);

/* Number of entries preallocated for DMA-API debugging */
#define PREALLOC_DMA_DEBUG_ENTRIES       32768

int dma_set_mask(struct device *dev, u64 mask)
{
	if (!dev->dma_mask || !dma_supported(dev, mask))
		return -EIO;

	*dev->dma_mask = mask;

	return 0;
}
EXPORT_SYMBOL(dma_set_mask);

#if defined(CONFIG_X86_64) && !defined(CONFIG_NUMA) && !defined(CONFIG_XEN)
static __initdata void *dma32_bootmem_ptr;
static unsigned long dma32_bootmem_size __initdata = (128ULL<<20);

static int __init parse_dma32_size_opt(char *p)
{
	if (!p)
		return -EINVAL;
	dma32_bootmem_size = memparse(p, &p);
	return 0;
}
early_param("dma32_size", parse_dma32_size_opt);

void __init dma32_reserve_bootmem(void)
{
	unsigned long size, align;
	if (max_pfn <= MAX_DMA32_PFN)
		return;

	/*
	 * check aperture_64.c allocate_aperture() for reason about
	 * using 512M as goal
	 */
	align = 64ULL<<20;
	size = roundup(dma32_bootmem_size, align);
	dma32_bootmem_ptr = __alloc_bootmem_nopanic(size, align,
				 512ULL<<20);
	/*
	 * Kmemleak should not scan this block as it may not be mapped via the
	 * kernel direct mapping.
	 */
	kmemleak_ignore(dma32_bootmem_ptr);
	if (dma32_bootmem_ptr)
		dma32_bootmem_size = size;
	else
		dma32_bootmem_size = 0;
}
static void __init dma32_free_bootmem(void)
{

	if (max_pfn <= MAX_DMA32_PFN)
		return;

	if (!dma32_bootmem_ptr)
		return;

	free_bootmem(__pa(dma32_bootmem_ptr), dma32_bootmem_size);

	dma32_bootmem_ptr = NULL;
	dma32_bootmem_size = 0;
}
#else
void __init dma32_reserve_bootmem(void)
{
}
static void __init dma32_free_bootmem(void)
{
}

#endif

static struct dma_map_ops swiotlb_dma_ops = {
	.alloc_coherent = dma_generic_alloc_coherent,
	.free_coherent = dma_generic_free_coherent,
	.mapping_error = swiotlb_dma_mapping_error,
	.map_page = swiotlb_map_page,
	.unmap_page = swiotlb_unmap_page,
	.sync_single_for_cpu = swiotlb_sync_single_for_cpu,
	.sync_single_for_device = swiotlb_sync_single_for_device,
	.sync_single_range_for_cpu = swiotlb_sync_single_range_for_cpu,
	.sync_single_range_for_device = swiotlb_sync_single_range_for_device,
	.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
	.sync_sg_for_device = swiotlb_sync_sg_for_device,
	.map_sg = swiotlb_map_sg_attrs,
	.unmap_sg = swiotlb_unmap_sg_attrs,
	.dma_supported = swiotlb_dma_supported
};

void __init pci_iommu_alloc(void)
{
	/* free the range so iommu could get some range less than 4G */
	dma32_free_bootmem();

	if (pci_swiotlb_detect())
		goto out;

	gart_iommu_hole_init();

	detect_calgary();

	detect_intel_iommu();

	/* needs to be called after gart_iommu_hole_init */
	amd_iommu_detect();
out:
	swiotlb_init(1);
	if (swiotlb) {
		printk(KERN_INFO "PCI-DMA: Using software bounce buffering for IO (SWIOTLB)\n");
		dma_ops = &swiotlb_dma_ops;
	}
}

void *dma_generic_alloc_coherent(struct device *dev, size_t size,
				 dma_addr_t *dma_addr, gfp_t flag)
{
	unsigned long dma_mask;
	struct page *page;
#ifndef CONFIG_XEN
	dma_addr_t addr;
#else
	void *memory;
#endif
	unsigned int order = get_order(size);

	dma_mask = dma_alloc_coherent_mask(dev, flag);

#ifndef CONFIG_XEN
	flag |= __GFP_ZERO;
again:
#else
	flag &= ~(__GFP_DMA | __GFP_DMA32);
#endif
	page = alloc_pages_node(dev_to_node(dev), flag, order);
	if (!page)
		return NULL;

#ifndef CONFIG_XEN
	addr = page_to_phys(page);
	if (addr + size > dma_mask) {
		__free_pages(page, order);

		if (dma_mask < DMA_BIT_MASK(32) && !(flag & GFP_DMA)) {
			flag = (flag & ~GFP_DMA32) | GFP_DMA;
			goto again;
		}

		return NULL;
	}

	*dma_addr = addr;
	return page_address(page);
#else
	memory = page_address(page);
	if (xen_create_contiguous_region((unsigned long)memory, order,
					 fls64(dma_mask))) {
		__free_pages(page, order);
		return NULL;
	}

	*dma_addr = virt_to_bus(memory);
	return memset(memory, 0, size);
#endif
}

#ifdef CONFIG_XEN
void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
			       dma_addr_t dma_addr)
{
	unsigned int order = get_order(size);
	unsigned long va = (unsigned long)vaddr;

	xen_destroy_contiguous_region(va, order);
	free_pages(va, order);
}
#endif

/*
 * See <Documentation/x86_64/boot-options.txt> for the iommu kernel parameter
 * documentation.
 */
static __init int iommu_setup(char *p)
{
	iommu_merge = 1;

	if (!p)
		return -EINVAL;

	while (*p) {
		if (!strncmp(p, "off", 3))
			no_iommu = 1;
		/* gart_parse_options has more force support */
		if (!strncmp(p, "force", 5))
			force_iommu = 1;
		if (!strncmp(p, "noforce", 7)) {
			iommu_merge = 0;
			force_iommu = 0;
		}

		if (!strncmp(p, "biomerge", 8)) {
			iommu_merge = 1;
			force_iommu = 1;
		}
		if (!strncmp(p, "panic", 5))
			panic_on_overflow = 1;
		if (!strncmp(p, "nopanic", 7))
			panic_on_overflow = 0;
		if (!strncmp(p, "merge", 5)) {
			iommu_merge = 1;
			force_iommu = 1;
		}
		if (!strncmp(p, "nomerge", 7))
			iommu_merge = 0;
		if (!strncmp(p, "forcesac", 8))
			iommu_sac_force = 1;
		if (!strncmp(p, "allowdac", 8))
			forbid_dac = 0;
		if (!strncmp(p, "nodac", 5))
			forbid_dac = 1;
		if (!strncmp(p, "usedac", 6)) {
			forbid_dac = -1;
			return 1;
		}
#ifdef CONFIG_SWIOTLB
		if (!strncmp(p, "soft", 4))
			swiotlb = 1;
#endif
		if (!strncmp(p, "pt", 2))
			iommu_pass_through = 1;

		gart_parse_options(p);

#ifdef CONFIG_CALGARY_IOMMU
		if (!strncmp(p, "calgary", 7))
			use_calgary = 1;
#endif /* CONFIG_CALGARY_IOMMU */

		p += strcspn(p, ",");
		if (*p == ',')
			++p;
	}
	return 0;
}
early_param("iommu", iommu_setup);

static int check_pages_physically_contiguous(unsigned long pfn,
					     unsigned int offset,
					     size_t length)
{
	unsigned long next_mfn;
	int i;
	int nr_pages;

	next_mfn = pfn_to_mfn(pfn);
	nr_pages = (offset + length + PAGE_SIZE-1) >> PAGE_SHIFT;

	for (i = 1; i < nr_pages; i++) {
		if (pfn_to_mfn(++pfn) != ++next_mfn)
			return 0;
	}
	return 1;
}

int range_straddles_page_boundary(paddr_t p, size_t size)
{
	unsigned long pfn = p >> PAGE_SHIFT;
	unsigned int offset = p & ~PAGE_MASK;

	return ((offset + size > PAGE_SIZE) &&
		!check_pages_physically_contiguous(pfn, offset, size));
}

int dma_supported(struct device *dev, u64 mask)
{
	struct dma_map_ops *ops = get_dma_ops(dev);

#ifdef CONFIG_PCI
	if (mask > 0xffffffff && forbid_dac > 0) {
		dev_info(dev, "PCI: Disallowing DAC for device\n");
		return 0;
	}
#endif

	if (ops->dma_supported)
		return ops->dma_supported(dev, mask);

	/* Copied from i386. Doesn't make much sense, because it will
	   only work for pci_alloc_coherent.
	   The caller just has to use GFP_DMA in this case. */
	if (mask < DMA_BIT_MASK(24))
		return 0;

	/* Tell the device to use SAC when IOMMU force is on.  This
	   allows the driver to use cheaper accesses in some cases.

	   Problem with this is that if we overflow the IOMMU area and
	   return DAC as fallback address the device may not handle it
	   correctly.

	   As a special case some controllers have a 39bit address
	   mode that is as efficient as 32bit (aic79xx). Don't force
	   SAC for these.  Assume all masks <= 40 bits are of this
	   type. Normally this doesn't make any difference, but gives
	   more gentle handling of IOMMU overflow. */
	if (iommu_sac_force && (mask >= DMA_BIT_MASK(40))) {
		dev_info(dev, "Force SAC with mask %Lx\n", mask);
		return 0;
	}

	return 1;
}
EXPORT_SYMBOL(dma_supported);

static int __init pci_iommu_init(void)
{
	dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);

#ifdef CONFIG_PCI
	dma_debug_add_bus(&pci_bus_type);
#endif
	x86_init.iommu.iommu_init();

#ifndef CONFIG_XEN
	if (swiotlb) {
		printk(KERN_INFO "PCI-DMA: "
		       "Using software bounce buffering for IO (SWIOTLB)\n");
		swiotlb_print_info();
	} else
		swiotlb_free();
#endif

	return 0;
}
/* Must execute after PCI subsystem */
rootfs_initcall(pci_iommu_init);

#ifdef CONFIG_PCI
/* Many VIA bridges seem to corrupt data for DAC. Disable it here */

static __devinit void via_no_dac(struct pci_dev *dev)
{
	if ((dev->class >> 8) == PCI_CLASS_BRIDGE_PCI && forbid_dac == 0) {
		dev_info(&dev->dev, "disabling DAC on VIA PCI bridge\n");
		forbid_dac = 1;
	}
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
#endif

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:40       ` Ian Campbell
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
  2011-11-28 16:58         ` Laszlo Ersek
@ 2011-11-29  9:37         ` Carsten Schiers
  2 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-11-29  9:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ian Campbell
  Cc: xen-devel, lersek, zhenzhong.duan, konrad.wilk


[-- Attachment #1.1: Type: text/plain, Size: 459 bytes --]

The swiotlb-xen used by classic-xen kernels (which I assume is what
Carsten means by "Xenified") isn't exactly the same as the stuff in
mainline Linux, it's been heavily refactored for one thing. It's not
impossible that mainline is bouncing something it doesn't really need
to.

Yes, it's a 2.6.34 kernel with Andrew Lyon's backported patches found here:

 
  http://code.google.com/p/gentoo-xen-kernel/downloads/list

 
GrC.


[-- Attachment #1.2: Type: text/html, Size: 1388 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 15:30     ` Konrad Rzeszutek Wilk
@ 2011-11-29  9:42       ` Carsten Schiers
  0 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-11-29  9:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, konrad.wilk


[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]

 

>   - with load I mean the %CPU of xentop
>   - there is no change in CPU usage of the DomU or Dom0

Uhh, which matrix are using for that? CPU usage...? This is if you
change the DomU or the amount of memory the guest has? This is not
the load number (xentop value)?

I had a quick look into the munin plugin. It reads the output of "xm li", Time in seconds and normalizes it. 
But the effect is also visible in the CPU(%) column of xentop, if the DomU is on higher load.

 
BR,C.


[-- Attachment #1.2: Type: text/html, Size: 1430 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
  2011-11-29  8:31           ` Jan Beulich
@ 2011-11-29  9:46           ` Carsten Schiers
  2011-11-29 10:23           ` Ian Campbell
  2 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-11-29  9:46 UTC (permalink / raw)
  To: Ian Campbell, Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, lersek, zhenzhong.duan


[-- Attachment #1.1: Type: text/plain, Size: 407 bytes --]

Carsten, let me prep up a patch that will print some diagnostic information
during the runtime - to see how often it does the bounce, the usage, etc..

 
Jup, looking forward to implementing it. I can include them into any kernel. 2.6.18 would be

a bit difficult though, as the driver pack isn't compatible any longer...so I'd prefer 2.6.34 Xenified

vs. 3.1.2 pvops.

 
BR,C.


[-- Attachment #1.2: Type: text/html, Size: 1292 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-28 16:45         ` Konrad Rzeszutek Wilk
  2011-11-29  8:31           ` Jan Beulich
  2011-11-29  9:46           ` Carsten Schiers
@ 2011-11-29 10:23           ` Ian Campbell
  2011-11-29 15:33             ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 66+ messages in thread
From: Ian Campbell @ 2011-11-29 10:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers,
	zhenzhong.duan, lersek

On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> > 
> > > > I looked through my old mails from you and you explained already the necessity of double
> > > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > > Xenified kernel not have this kind of issue?
> > > 
> > > That is a puzzle. It should not. The code is very much the same - both
> > > use the generic SWIOTLB which has not changed for years.
> > 
> > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > mainline Linux, it's been heavily refactored for one thing. It's not
> > impossible that mainline is bouncing something it doesn't really need
> > to.
> 
> The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> mark and give it to the driver. The driver can use it as it wishes and there
> is no need to bounce buffer.

Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
subset of swiotlb is in use then, all the bouncing stuff _should_ be
idle/unused -- but has that been confirmed?

> 
> But I can't find the implementation of that in the classic Xen-SWIOTLB. It looks
> as if it is using map_single which would be taking the memory out of the
> pool for a very long time, instead of allocating memory and "swizzling" the MFNs.
> [Note, I looked at the 2.6.18 hg tree for classic, the 2.6.34 is probably
> improved much better so let me check that]
> 
> Carsten, let me prep up a patch that will print some diagnostic information
> during the runtime - to see how often it does the bounce, the usage, etc..
> 
> > 
> > It's also possible that the dma mask of the device is different/wrong in
> > mainline leading to such additional bouncing.
> 
> If one were to use map_page and such - yes. But the alloc_coherent bypasses
> that and ends up allocating it right under the 4GB (or rather it allocates
> based on the dev->coherent_mask and swizzles the MFNs as required).
> 
> > 
> > I guess it's also possible that the classic-Xen kernels are playing fast
> > and loose by not bouncing something they should (although if so they
> > appear to be getting away with it...) or that there is some difference
> > which really means mainline needs to bounce while classic-Xen doesn't.
> 
> <nods> Could be very well.
> > 
> > Ian.
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-29 10:23           ` Ian Campbell
@ 2011-11-29 15:33             ` Konrad Rzeszutek Wilk
  2011-12-02 15:23               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-29 15:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers,
	zhenzhong.duan, lersek

On Tue, Nov 29, 2011 at 10:23:18AM +0000, Ian Campbell wrote:
> On Mon, 2011-11-28 at 16:45 +0000, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 28, 2011 at 03:40:13PM +0000, Ian Campbell wrote:
> > > On Mon, 2011-11-28 at 15:28 +0000, Konrad Rzeszutek Wilk wrote:
> > > > On Fri, Nov 25, 2011 at 11:11:55PM +0100, Carsten Schiers wrote:
> > > 
> > > > > I looked through my old mails from you and you explained already the necessity of double
> > > > > bounce buffering (PCI->below 4GB->above 4GB). What I don't understand is: why does the
> > > > > Xenified kernel not have this kind of issue?
> > > > 
> > > > That is a puzzle. It should not. The code is very much the same - both
> > > > use the generic SWIOTLB which has not changed for years.
> > > 
> > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > impossible that mainline is bouncing something it doesn't really need
> > > to.
> > 
> > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > mark and give it to the driver. The driver can use it as it wishes and there
> > is no need to bounce buffer.
> 
> Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> subset of swiotlb is in use then, all the bouncing stuff _should_ be
> idle/unused -- but has that been confirmed?

Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
Now I just need to find a moment to write it :-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-11-29 15:33             ` Konrad Rzeszutek Wilk
@ 2011-12-02 15:23               ` Konrad Rzeszutek Wilk
  2011-12-04 11:59                 ` Carsten Schiers
                                   ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-02 15:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: lersek, xen-devel, Carsten Schiers, Ian Campbell, zhenzhong.duan

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > > 
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.


[-- Attachment #2: swiotlb-debug.patch --]
[-- Type: text/plain, Size: 10546 bytes --]

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index a59638b..d513c8d 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -105,4 +105,10 @@ config SWIOTLB_XEN
 	depends on PCI
 	select SWIOTLB
 
+config SWIOTLB_DEBUG
+	tristate "swiotlb debug facility"
+	default m
+	help
+	  Do not enable it unless you know what you are doing.
+
 endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index bbc1825..1dea490 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_XENFS)			+= xenfs/
 obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
 obj-$(CONFIG_XEN_PLATFORM_PCI)		+= xen-platform-pci.o
 obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
+obj-$(CONFIG_SWIOTLB_DEBUG)		+= dump_swiotlb.o
 obj-$(CONFIG_XEN_DOM0)			+= pci.o
 
 xen-evtchn-y				:= evtchn.o
diff --git a/drivers/xen/dump_swiotlb.c b/drivers/xen/dump_swiotlb.c
new file mode 100644
index 0000000..e0e4a0b
--- /dev/null
+++ b/drivers/xen/dump_swiotlb.c
@@ -0,0 +1,72 @@
+/*
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License v2.0 as published by
+ * the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/stat.h>
+#include <linux/err.h>
+#include <linux/ctype.h>
+#include <linux/slab.h>
+#include <linux/limits.h>
+#include <linux/device.h>
+#include <linux/pci.h>
+#include <linux/blkdev.h>
+#include <linux/device.h>
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/fcntl.h>
+#include <linux/slab.h>
+#include <linux/kmod.h>
+#include <linux/major.h>
+#include <linux/highmem.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include <linux/blkpg.h>
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+#include <linux/namei.h>
+#include <asm/uaccess.h>
+
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+
+#include <linux/swiotlb.h>
+#define DUMP_SWIOTLB_FUN  "0.1"
+
+MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad@darnok.org>");
+MODULE_DESCRIPTION("dump swiotlb");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DUMP_SWIOTLB_FUN);
+
+extern int xen_swiotlb_start_thread(void);
+extern void xen_swiotlb_stop_thread(void);
+static int __init dump_swiotlb_init(void)
+{
+	printk(KERN_INFO "Starting SWIOTLB debug thread.\n");
+	swiotlb_start_thread();
+	xen_swiotlb_start_thread();
+	return 0;
+}
+
+static void __exit dump_swiotlb_exit(void)
+{
+	swiotlb_stop_thread();
+	xen_swiotlb_stop_thread();
+}
+
+module_init(dump_swiotlb_init);
+module_exit(dump_swiotlb_exit);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6e8c15a..c833501 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -143,6 +143,63 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
 	return 0;
 }
 
+#include <linux/percpu.h>
+struct xen_swiotlb_debug {
+	unsigned long alloc;
+	unsigned long dealloc;
+	char dev_name[64];
+};
+static DEFINE_PER_CPU(struct xen_swiotlb_debug, xen_tlb_debug);
+#include <linux/kthread.h>
+static int xen_swiotlb_debug_thread(void *arg)
+{
+	int cpu;
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout_interruptible(HZ*10);
+
+		for_each_online_cpu(cpu) {
+			struct xen_swiotlb_debug *d;
+
+			d = &per_cpu(xen_tlb_debug, cpu);
+			/* Can't really happend.*/
+			if (!d)
+				continue;
+
+			if (d->dev_name[0] == 0)
+				continue;
+
+			printk(KERN_INFO "%u %s alloc coherent: %ld, free: %ld\n",
+				cpu,
+				d->dev_name ? d->dev_name : "?",
+				d->alloc, d->dealloc);
+
+			memset(d, 0, sizeof(struct xen_swiotlb_debug));
+		}
+
+	} while (!kthread_should_stop());
+	return 0;
+}
+static struct task_struct *xen_debug_thread = NULL;
+
+int xen_swiotlb_start_thread(void) {
+
+	if (xen_debug_thread)
+		return -EINVAL;
+	printk(KERN_INFO "%s: Go!\n",__func__);
+	xen_debug_thread =  kthread_run(xen_swiotlb_debug_thread, NULL, "xen_swiotlb_debug");
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_start_thread);
+void xen_swiotlb_stop_thread(void) {
+
+	printk(KERN_INFO "%s: Stop!\n",__func__);
+	if (xen_debug_thread)
+		kthread_stop(xen_debug_thread);
+	xen_debug_thread = NULL;
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_stop_thread);
+
 void __init xen_swiotlb_init(int verbose)
 {
 	unsigned long bytes;
@@ -194,7 +251,14 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
 	int order = get_order(size);
 	u64 dma_mask = DMA_BIT_MASK(32);
 	unsigned long vstart;
-
+	struct xen_swiotlb_debug *d;
+
+	preempt_disable();
+	d = &__get_cpu_var(xen_tlb_debug);
+	preempt_enable();
+	d->alloc++;
+	snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                dev_driver_string(hwdev), dev_name(hwdev));
 	/*
 	* Ignore region specifiers - the kernel's ideas of
 	* pseudo-phys memory layout has nothing to do with the
@@ -230,6 +294,14 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
 			  dma_addr_t dev_addr)
 {
 	int order = get_order(size);
+	struct xen_swiotlb_debug *d;
+
+	preempt_disable();
+	d = &__get_cpu_var(xen_tlb_debug);
+	preempt_enable();
+	d->dealloc++;
+	snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                dev_driver_string(hwdev), dev_name(hwdev));
 
 	if (dma_release_from_coherent(hwdev, order, vaddr))
 		return;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..0d2e049 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -26,6 +26,8 @@ extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swioltb_nr_tbl(void);
 
+extern int swiotlb_start_thread(void);
+extern void swiotlb_stop_thread(void);
 /*
  * Enumeration for sync targets
  */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..5446076 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -92,6 +92,74 @@ static DEFINE_SPINLOCK(io_tlb_lock);
 
 static int late_alloc;
 
+#include <linux/percpu.h>
+struct swiotlb_debug {
+	unsigned long bounce_to;
+	unsigned long bounce_from;
+	unsigned long bounce_slow;
+	unsigned long map;
+	unsigned long unmap;
+	unsigned long sync;
+	char dev_name[64];
+};
+static DEFINE_PER_CPU(struct swiotlb_debug, tlb_debug);
+#include <linux/kthread.h>
+static int swiotlb_debug_thread(void *arg)
+{
+	int cpu;
+	int size = io_tlb_nslabs;
+	do {
+		int i;
+		unsigned long filled = 0;
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout_interruptible(HZ*5);
+
+		for_each_online_cpu(cpu) {
+			struct swiotlb_debug *d = &per_cpu(tlb_debug, cpu);
+			/* Can't really happend.*/
+			if (!d)
+				continue;
+			if (d->dev_name[0] == 0)
+				continue;
+
+			printk(KERN_INFO "%d [%s] bounce: from:%ld(slow:%ld)to:%ld map:%ld unmap:%ld sync:%ld\n",
+				cpu,
+				d->dev_name ? d->dev_name : "?",
+				d->bounce_from,
+				d->bounce_slow,
+				d->bounce_to,
+				d->map, d->unmap, d->sync);
+			memset(d, 0, sizeof(struct swiotlb_debug));
+		}
+		/* Very crude calculation. */
+		for (i = 0; i < size; i++) {
+			if (io_tlb_list[i] == 0)
+				filled++;
+		}
+		printk(KERN_INFO "SWIOTLB is %ld%% full\n", (filled * 100) / size);
+
+	} while (!kthread_should_stop());
+	return 0;
+}
+static struct task_struct *debug_thread = NULL;
+
+int swiotlb_start_thread(void) {
+
+	if (debug_thread)
+		return -EINVAL;
+	printk(KERN_INFO "%s: Go!\n",__func__);
+	debug_thread =  kthread_run(swiotlb_debug_thread, NULL, "swiotlb_debug");
+}
+EXPORT_SYMBOL_GPL(swiotlb_start_thread);
+void swiotlb_stop_thread(void) {
+
+	printk(KERN_INFO "%s: Stop!\n",__func__);
+	if (debug_thread)
+		kthread_stop(debug_thread);
+	debug_thread = NULL;
+}
+EXPORT_SYMBOL_GPL(swiotlb_stop_thread);
+
 static int __init
 setup_io_tlb_npages(char *str)
 {
@@ -166,6 +234,7 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 		panic("Cannot allocate SWIOTLB overflow buffer!\n");
 	if (verbose)
 		swiotlb_print_info();
+
 }
 
 /*
@@ -336,6 +405,7 @@ void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
 		    enum dma_data_direction dir)
 {
 	unsigned long pfn = PFN_DOWN(phys);
+	struct swiotlb_debug *d = &__get_cpu_var(tlb_debug);
 
 	if (PageHighMem(pfn_to_page(pfn))) {
 		/* The buffer does not have a mapping.  Map it in and copy */
@@ -362,11 +432,16 @@ void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
 			dma_addr += sz;
 			offset = 0;
 		}
+		d->bounce_slow++;
 	} else {
-		if (dir == DMA_TO_DEVICE)
+		if (dir == DMA_TO_DEVICE) {
 			memcpy(dma_addr, phys_to_virt(phys), size);
-		else
+			d->bounce_to++;
+		}
+		else {
 			memcpy(phys_to_virt(phys), dma_addr, size);
+			d->bounce_from++;
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(swiotlb_bounce);
@@ -471,7 +546,15 @@ found:
 		io_tlb_orig_addr[index+i] = phys + (i << IO_TLB_SHIFT);
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
 		swiotlb_bounce(phys, dma_addr, size, DMA_TO_DEVICE);
-
+	{
+		struct swiotlb_debug *d;
+		preempt_disable();
+		d = &__get_cpu_var(tlb_debug);
+		preempt_enable();
+		d->map++;
+		snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                	dev_driver_string(hwdev), dev_name(hwdev));
+	}
 	return dma_addr;
 }
 EXPORT_SYMBOL_GPL(swiotlb_tbl_map_single);
@@ -531,6 +614,15 @@ swiotlb_tbl_unmap_single(struct device *hwdev, char *dma_addr, size_t size,
 			io_tlb_list[i] = ++count;
 	}
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+	{
+		struct swiotlb_debug *d;
+		preempt_disable();
+		d = &__get_cpu_var(tlb_debug);
+		preempt_enable();
+		d->unmap++;
+		snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                	dev_driver_string(hwdev), dev_name(hwdev));
+	}
 }
 EXPORT_SYMBOL_GPL(swiotlb_tbl_unmap_single);
 
@@ -541,7 +633,13 @@ swiotlb_tbl_sync_single(struct device *hwdev, char *dma_addr, size_t size,
 {
 	int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t phys = io_tlb_orig_addr[index];
-
+	struct swiotlb_debug *d;
+	preempt_disable();
+	d = &__get_cpu_var(tlb_debug);
+	preempt_enable();
+	d->sync++;
+	snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+               	dev_driver_string(hwdev), dev_name(hwdev));
 	phys += ((unsigned long)dma_addr & ((1 << IO_TLB_SHIFT) - 1));
 
 	switch (target) {

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-02 15:23               ` Konrad Rzeszutek Wilk
@ 2011-12-04 11:59                 ` Carsten Schiers
  2011-12-04 12:09                 ` Carsten Schiers
  2011-12-04 12:18                 ` Carsten Schiers
  2 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-12-04 11:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: xen-devel, lersek, Ian Campbell, zhenzhong.duan

Thank you, Konrad.

I applied the patch to 3.1.2. In order to have a clear picture, I only enabled one PCI card.
The result is:

[   28.028032] Starting SWIOTLB debug thread.
[   28.028076] swiotlb_start_thread: Go!
[   28.028622] xen_swiotlb_start_thread: Go!
[   33.028153] 0 [budget_av 0000:00:00.0] bounce: from:555352(slow:0)to:0 map:329 unmap:0 sync:555352
[   33.028294] SWIOTLB is 2% full
[   38.028178] 0 budget_av 0000:00:00.0 alloc coherent: 4, free: 0
[   38.028230] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[   38.028352] SWIOTLB is 2% full
[   43.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   43.028310] SWIOTLB is 2% full
[   48.028199] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[   48.028334] SWIOTLB is 2% full
[   53.028170] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   53.028309] SWIOTLB is 2% full
[   58.028138] 0 [budget_av 0000:00:00.0] bounce: from:126994(slow:0)to:0 map:0 unmap:0 sync:126994
[   58.028195] SWIOTLB is 2% full
[   63.028170] 0 [budget_av 0000:00:00.0] bounce: from:121401(slow:0)to:0 map:0 unmap:0 sync:121401
[   63.029560] SWIOTLB is 2% full
[   68.028193] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[   68.028329] SWIOTLB is 2% full
[   73.028104] 0 [budget_av 0000:00:00.0] bounce: from:122717(slow:0)to:0 map:0 unmap:0 sync:122717
[   73.028244] SWIOTLB is 2% full
[   78.028191] 0 [budget_av 0000:00:00.0] bounce: from:127981(slow:0)to:0 map:0 unmap:0 sync:127981
[   78.028331] SWIOTLB is 2% full
[   83.028112] 0 [budget_av 0000:00:00.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   83.028171] SWIOTLB is 2% full

Was that long enough? I hope this helps.

Carsten.

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > > 
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-02 15:23               ` Konrad Rzeszutek Wilk
  2011-12-04 11:59                 ` Carsten Schiers
@ 2011-12-04 12:09                 ` Carsten Schiers
  2011-12-06  3:26                   ` Konrad Rzeszutek Wilk
  2011-12-04 12:18                 ` Carsten Schiers
  2 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-12-04 12:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: xen-devel, lersek, Ian Campbell, zhenzhong.duan

Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:

[   23.842720] Starting SWIOTLB debug thread.
[   23.842750] swiotlb_start_thread: Go!
[   23.842838] xen_swiotlb_start_thread: Go!
[   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
[   28.841592] SWIOTLB is 4% full
[   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
[   33.840283] SWIOTLB is 4% full
[   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
[   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   38.840361] SWIOTLB is 4% full
[   43.840182] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   43.840323] SWIOTLB is 4% full
[   48.840094] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
[   48.840154] SWIOTLB is 4% full
[   53.840160] 0 [budget_av 0000:00:01.0] bounce: from:119756(slow:0)to:0 map:0 unmap:0 sync:119756
[   53.840301] SWIOTLB is 4% full
[   58.840202] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   58.840339] SWIOTLB is 4% full
[   63.840626] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
[   63.840686] SWIOTLB is 4% full
[   68.840122] 0 [budget_av 0000:00:01.0] bounce: from:127323(slow:0)to:0 map:0 unmap:0 sync:127323
[   68.840180] SWIOTLB is 4% full
[   73.840647] 0 [budget_av 0000:00:01.0] bounce: from:211547(slow:0)to:0 map:0 unmap:0 sync:211547
[   73.840784] SWIOTLB is 4% full
[   78.840204] 0 [budget_av 0000:00:01.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[   78.840344] SWIOTLB is 4% full
[   83.840114] 0 [budget_av 0000:00:01.0] bounce: from:255304(slow:0)to:0 map:0 unmap:0 sync:255304
[   83.840178] SWIOTLB is 4% full
[   88.840158] 0 [budget_av 0000:00:01.0] bounce: from:256620(slow:0)to:0 map:0 unmap:0 sync:256620
[   88.840302] SWIOTLB is 4% full
[   93.840185] 0 [budget_av 0000:00:00.0] bounce: from:250040(slow:0)to:0 map:0 unmap:0 sync:250040
[   93.840319] SWIOTLB is 4% full
[   98.840181] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[   98.841563] SWIOTLB is 4% full
[  103.841221] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[  103.841361] SWIOTLB is 4% full
[  108.840247] 0 [budget_av 0000:00:00.0] bounce: from:255962(slow:0)to:0 map:0 unmap:0 sync:255962
[  108.840389] SWIOTLB is 4% full
[  113.840157] 0 [budget_av 0000:00:00.0] bounce: from:261555(slow:0)to:0 map:0 unmap:0 sync:261555
[  113.840298] SWIOTLB is 4% full
[  118.840119] 0 [budget_av 0000:00:00.0] bounce: from:295442(slow:0)to:0 map:0 unmap:0 sync:295442
[  118.840259] SWIOTLB is 4% full
[  123.841025] 0 [budget_av 0000:00:00.0] bounce: from:295113(slow:0)to:0 map:0 unmap:0 sync:295113
[  123.841164] SWIOTLB is 4% full
[  128.840175] 0 [budget_av 0000:00:00.0] bounce: from:294784(slow:0)to:0 map:0 unmap:0 sync:294784
[  128.840310] SWIOTLB is 4% full
[  133.840194] 0 [budget_av 0000:00:00.0] bounce: from:293797(slow:0)to:0 map:0 unmap:0 sync:293797
[  133.840330] SWIOTLB is 4% full
[  138.840498] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  138.840637] SWIOTLB is 4% full
[  143.840173] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  143.840313] SWIOTLB is 4% full
[  148.840215] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[  148.840355] SWIOTLB is 4% full
[  153.840205] 0 [budget_av 0000:00:01.0] bounce: from:329658(slow:0)to:0 map:0 unmap:0 sync:329658
[  153.840341] SWIOTLB is 4% full
[  158.840137] 0 [budget_av 0000:00:00.0] bounce: from:342160(slow:0)to:0 map:0 unmap:0 sync:342160
[  158.840277] SWIOTLB is 4% full
[  163.841288] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  163.841424] SWIOTLB is 4% full
[  168.840198] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  168.840339] SWIOTLB is 4% full
[  173.840167] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  173.840304] SWIOTLB is 4% full
[  178.840184] 0 [budget_av 0000:00:00.0] bounce: from:328013(slow:0)to:0 map:0 unmap:0 sync:328013
[  178.840324] SWIOTLB is 4% full
[  183.840129] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[  183.840269] SWIOTLB is 4% full
[  188.840123] 0 [budget_av 0000:00:01.0] bounce: from:340515(slow:0)to:0 map:0 unmap:0 sync:340515
[  188.841647] SWIOTLB is 4% full
[  193.840192] 0 [budget_av 0000:00:00.0] bounce: from:338541(slow:0)to:0 map:0 unmap:0 sync:338541
[  193.840329] SWIOTLB is 4% full
[  198.840148] 0 [budget_av 0000:00:01.0] bounce: from:330316(slow:0)to:0 map:0 unmap:0 sync:330316
[  198.840230] SWIOTLB is 4% full
[  203.840860] 0 [budget_av 0000:00:00.0] bounce: from:341831(slow:0)to:0 map:0 unmap:0 sync:341831
[  203.841000] SWIOTLB is 4% full
[  208.840562] 0 [budget_av 0000:00:01.0] bounce: from:337883(slow:0)to:0 map:0 unmap:0 sync:337883
[  208.840698] SWIOTLB is 4% full
[  213.840171] 0 [budget_av 0000:00:00.0] bounce: from:341502(slow:0)to:0 map:0 unmap:0 sync:341502
[  213.840311] SWIOTLB is 4% full
[  218.840214] 0 [budget_av 0000:00:01.0] bounce: from:320117(slow:0)to:0 map:0 unmap:0 sync:320117
[  218.840354] SWIOTLB is 4% full
[  223.840238] 0 [budget_av 0000:00:01.0] bounce: from:299390(slow:0)to:0 map:0 unmap:0 sync:299390
[  223.840373] SWIOTLB is 4% full
[  228.841415] 0 [budget_av 0000:00:01.0] bounce: from:298732(slow:0)to:0 map:0 unmap:0 sync:298732
[  228.841560] SWIOTLB is 4% full
[  233.840705] 0 [budget_av 0000:00:00.0] bounce: from:299061(slow:0)to:0 map:0 unmap:0 sync:299061
[  233.840844] SWIOTLB is 4% full
[  238.840145] 0 [budget_av 0000:00:01.0] bounce: from:293468(slow:0)to:0 map:0 unmap:0 sync:293468
[  238.840280] SWIOTLB is 4% full

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > > 
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-02 15:23               ` Konrad Rzeszutek Wilk
  2011-12-04 11:59                 ` Carsten Schiers
  2011-12-04 12:09                 ` Carsten Schiers
@ 2011-12-04 12:18                 ` Carsten Schiers
  2 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-12-04 12:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: xen-devel, lersek, Ian Campbell, zhenzhong.duan

Should eventually mention that I create the DomU with only the parameter iommu=soft. I hope
Nothing more is required. For Xenified, it's swiotlb=32,force.

Carsten.

-----Ursprüngliche Nachricht-----
Von: Konrad Rzeszutek Wilk [mailto:konrad@darnok.org] 
Gesendet: Freitag, 2. Dezember 2011 16:24
An: Konrad Rzeszutek Wilk
Cc: Ian Campbell; xen-devel; Carsten Schiers; zhenzhong.duan@oracle.com; lersek@redhat.com
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> > > > > That is a puzzle. It should not. The code is very much the same - both
> > > > > use the generic SWIOTLB which has not changed for years.
> > > > 
> > > > The swiotlb-xen used by classic-xen kernels (which I assume is what
> > > > Carsten means by "Xenified") isn't exactly the same as the stuff in
> > > > mainline Linux, it's been heavily refactored for one thing. It's not
> > > > impossible that mainline is bouncing something it doesn't really need
> > > > to.
> > > 
> > > The usage, at least with 'pci_alloc_coherent' is that there is no bouncing
> > > being done. The alloc_coherent will allocate a nice page, underneath the 4GB
> > > mark and give it to the driver. The driver can use it as it wishes and there
> > > is no need to bounce buffer.
> > 
> > Oh, I didn't realise dma_alloc_coherent was part of swiotlb now. Only a
> > subset of swiotlb is in use then, all the bouncing stuff _should_ be
> > idle/unused -- but has that been confirmed?
> 
> Nope. I hope that the diagnostic patch I have in mind will prove/disprove that.
> Now I just need to find a moment to write it :-)

Done!

Carsten, can you please patch your kernel with this hacky patch and
when you have booted the new kernel, just do

modprobe dump_swiotlb

it should give an idea of how many bounces are happening, coherent
allocations, syncs, and so on.. along with the last driver that
did those operations.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-04 12:09                 ` Carsten Schiers
@ 2011-12-06  3:26                   ` Konrad Rzeszutek Wilk
  2011-12-14 20:23                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-06  3:26 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Konrad Rzeszutek Wilk, xen-devel, lersek, Ian Campbell, zhenzhong.duan

On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
> 
> [   23.842720] Starting SWIOTLB debug thread.
> [   23.842750] swiotlb_start_thread: Go!
> [   23.842838] xen_swiotlb_start_thread: Go!
> [   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> [   28.841592] SWIOTLB is 4% full
> [   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> [   33.840283] SWIOTLB is 4% full
> [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> [   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310

Whoa. Yes. You are definitly using the bounce buffer :-)

Now it is time to look at why the drive is not using those coherent ones - it
looks to allocate just eight of them but does not use them.. Unless it is
using them _and_ bouncing them (which would be odd).

And BTW, you can lower your 'swiotlb=XX' value.  The 4% is how much you
are using of the default size.

I should find out_why_ the old Xen kernels do not use the bounce buffer
so much...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-06  3:26                   ` Konrad Rzeszutek Wilk
@ 2011-12-14 20:23                     ` Konrad Rzeszutek Wilk
  2011-12-14 22:07                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-14 20:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: lersek, xen-devel, Carsten Schiers, Ian Campbell, zhenzhong.duan

On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:
> On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
> > 
> > [   23.842720] Starting SWIOTLB debug thread.
> > [   23.842750] swiotlb_start_thread: Go!
> > [   23.842838] xen_swiotlb_start_thread: Go!
> > [   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > [   28.841592] SWIOTLB is 4% full
> > [   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > [   33.840283] SWIOTLB is 4% full
> > [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> > [   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
> 
> Whoa. Yes. You are definitly using the bounce buffer :-)
> 
> Now it is time to look at why the drive is not using those coherent ones - it
> looks to allocate just eight of them but does not use them.. Unless it is
> using them _and_ bouncing them (which would be odd).
> 
> And BTW, you can lower your 'swiotlb=XX' value.  The 4% is how much you
> are using of the default size.

So I able to see this with an atl1c ethernet driver on my SandyBridge i3
box. It looks as if the card is truly 32-bit so on a box with 8GB it
bounces the data. If I booted the Xen hypervisor with 'mem=4GB' I get no
bounces (no surprise there).

In other words - I see the same behavior you are seeing. Now off to:
> 
> I should find out_why_ the old Xen kernels do not use the bounce buffer
> so much...

which will require some fiddling around.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-14 20:23                     ` Konrad Rzeszutek Wilk
@ 2011-12-14 22:07                       ` Konrad Rzeszutek Wilk
  2011-12-15 14:52                         ` Carsten Schiers
  2011-12-16 14:56                         ` Carsten Schiers
  0 siblings, 2 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-14 22:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Ian Campbell, Carsten Schiers, zhenzhong.duan, linux, lersek

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

On Wed, Dec 14, 2011 at 04:23:51PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 05, 2011 at 10:26:21PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Sun, Dec 04, 2011 at 01:09:28PM +0100, Carsten Schiers wrote:
> > > Here with two cards enabled and creating a bit "work" by watching TV with one oft hem:
> > > 
> > > [   23.842720] Starting SWIOTLB debug thread.
> > > [   23.842750] swiotlb_start_thread: Go!
> > > [   23.842838] xen_swiotlb_start_thread: Go!
> > > [   28.841451] 0 [budget_av 0000:00:01.0] bounce: from:435596(slow:0)to:0 map:658 unmap:0 sync:435596
> > > [   28.841592] SWIOTLB is 4% full
> > > [   33.840147] 0 [budget_av 0000:00:01.0] bounce: from:127652(slow:0)to:0 map:0 unmap:0 sync:127652
> > > [   33.840283] SWIOTLB is 4% full
> > > [   33.844222] 0 budget_av 0000:00:01.0 alloc coherent: 8, free: 0
> > > [   38.840227] 0 [budget_av 0000:00:01.0] bounce: from:128310(slow:0)to:0 map:0 unmap:0 sync:128310
> > 
> > Whoa. Yes. You are definitly using the bounce buffer :-)
> > 
> > Now it is time to look at why the drive is not using those coherent ones - it
> > looks to allocate just eight of them but does not use them.. Unless it is
> > using them _and_ bouncing them (which would be odd).
> > 
> > And BTW, you can lower your 'swiotlb=XX' value.  The 4% is how much you
> > are using of the default size.
> 
> So I able to see this with an atl1c ethernet driver on my SandyBridge i3
> box. It looks as if the card is truly 32-bit so on a box with 8GB it
> bounces the data. If I booted the Xen hypervisor with 'mem=4GB' I get no
> bounces (no surprise there).
> 
> In other words - I see the same behavior you are seeing. Now off to:
> > 
> > I should find out_why_ the old Xen kernels do not use the bounce buffer
> > so much...
> 
> which will require some fiddling around.

And I am not seeing any difference - the swiotlb is used with the same usage when
booting a classic (old style XEnoLinux) 2.6.32 vs using a brand new pvops (3.2).
Obviously if I limit the physical amount of memory (so 'mem=4GB' on Xen hypervisor
line), the bounce usage disappears. Hmm, I wonder if there is a nice way to 
tell the hypervisor - hey, please stuff dom0 under 4GB.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?



[-- Attachment #2: swiotlb-against-old-type.patch --]
[-- Type: text/plain, Size: 7793 bytes --]

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index ab0bb23..17faefd 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -469,3 +469,10 @@ config XEN_SYS_HYPERVISOR
 	 hypervisor environment.  When running native or in another
 	 virtual environment, /sys/hypervisor will still be present,
 	 but will have no xen contents.
+
+config SWIOTLB_DEBUG
+	tristate "swiotlb debug facility."
+	default m
+	help
+	  Do not enable it.
+
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 28fb50a..df84614 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_XEN_GRANT_DEV)	+= gntdev/
 obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_UTIL)		+= sfc_netutil/
 obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_FRONTEND)	+= sfc_netfront/
 obj-$(CONFIG_XEN_NETDEV_ACCEL_SFC_BACKEND)	+= sfc_netback/
+obj-$(CONFIG_SWIOTLB_DEBUG)			+= dump_swiotlb.o
diff --git a/drivers/xen/dump_swiotlb.c b/drivers/xen/dump_swiotlb.c
new file mode 100644
index 0000000..7168eed
--- /dev/null
+++ b/drivers/xen/dump_swiotlb.c
@@ -0,0 +1,72 @@
+/*
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License v2.0 as published by
+ * the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/stat.h>
+#include <linux/err.h>
+#include <linux/ctype.h>
+#include <linux/slab.h>
+#include <linux/limits.h>
+#include <linux/device.h>
+#include <linux/pci.h>
+#include <linux/blkdev.h>
+#include <linux/device.h>
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/fcntl.h>
+#include <linux/slab.h>
+#include <linux/kmod.h>
+#include <linux/major.h>
+#include <linux/highmem.h>
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include <linux/blkpg.h>
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/mount.h>
+#include <linux/uio.h>
+#include <linux/namei.h>
+#include <asm/uaccess.h>
+
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+
+#include <linux/swiotlb.h>
+#define DUMP_SWIOTLB_FUN  "0.1"
+
+MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad@darnok.org>");
+MODULE_DESCRIPTION("dump swiotlb");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DUMP_SWIOTLB_FUN);
+/*
+extern int xen_swiotlb_start_thread(void);
+extern void xen_swiotlb_stop_thread(void);*/
+static int __init dump_swiotlb_init(void)
+{
+	printk(KERN_INFO "Starting SWIOTLB debug thread.\n");
+	swiotlb_start_thread();
+	//xen_swiotlb_start_thread();
+	return 0;
+}
+
+static void __exit dump_swiotlb_exit(void)
+{
+	swiotlb_stop_thread();
+	//xen_swiotlb_stop_thread();
+}
+
+module_init(dump_swiotlb_init);
+module_exit(dump_swiotlb_exit);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 73b1f1c..81f5a1e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -7,6 +7,9 @@ struct device;
 struct dma_attrs;
 struct scatterlist;
 
+ 
+extern int swiotlb_start_thread(void);
+extern void swiotlb_stop_thread(void);
 /*
  * Maximum allowable number of contiguous slabs to map,
  * must be a power of 2.  What is the appropriate value ?
diff --git a/lib/swiotlb-xen.c b/lib/swiotlb-xen.c
index 152696c..d1df462 100644
--- a/lib/swiotlb-xen.c
+++ b/lib/swiotlb-xen.c
@@ -118,6 +118,78 @@ setup_io_tlb_npages(char *str)
 }
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
+ 
+#include <linux/percpu.h>
+struct swiotlb_debug {
+	unsigned long bounce_to;
+	unsigned long bounce_from;
+	unsigned long bounce_slow;
+	unsigned long map;
+	unsigned long unmap;
+	unsigned long sync;
+	char dev_name[64];
+};
+
+;
+static DEFINE_PER_CPU(struct swiotlb_debug, tlb_debug);
+#include <linux/kthread.h>
+static int swiotlb_debug_thread(void *arg)
+{
+	int cpu;
+	int size = io_tlb_nslabs;
+	do {
+		int i;
+		unsigned long filled = 0;
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout_interruptible(HZ*5);
+
+		for_each_online_cpu(cpu) {
+			struct swiotlb_debug *d = &per_cpu(tlb_debug, cpu);
+			/* Can't really happend.*/
+			if (!d)
+				continue;
+			if (d->dev_name[0] == 0)
+				continue;
+
+			printk(KERN_INFO "%d [%s] bounce: from:%ld(slow:%ld)to:%ld map:%ld unmap:%ld sync:%ld\n",
+				cpu,
+				d->dev_name ? d->dev_name : "?",
+				d->bounce_from,
+				d->bounce_slow,
+				d->bounce_to,
+				d->map, d->unmap, d->sync);
+			memset(d, 0, sizeof(struct swiotlb_debug));
+		}
+		/* Very crude calculation. */
+		for (i = 0; i < size; i++) {
+			if (io_tlb_list[i] == 0)
+				filled++;
+		}
+		printk(KERN_INFO "SWIOTLB is %ld%% full\n", (filled * 100) / size);
+
+	} while (!kthread_should_stop());
+	return 0;
+}
+static struct task_struct *debug_thread = NULL;
+
+
+int swiotlb_start_thread(void) {
+
+	if (debug_thread)
+		return -EINVAL;
+	printk(KERN_INFO "%s: Go!\n",__func__);
+	debug_thread =  kthread_run(swiotlb_debug_thread, NULL, "swiotlb_debug");
+}
+EXPORT_SYMBOL_GPL(swiotlb_start_thread);
+void swiotlb_stop_thread(void) {
+
+	printk(KERN_INFO "%s: Stop!\n",__func__);
+	if (debug_thread)
+		kthread_stop(debug_thread);
+	debug_thread = NULL;
+}
+EXPORT_SYMBOL_GPL(swiotlb_stop_thread);
+
 
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
@@ -270,6 +342,11 @@ static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
 			   enum dma_data_direction dir)
 {
 	unsigned long pfn = PFN_DOWN(phys);
+	struct swiotlb_debug *d;
+
+	preempt_disable();
+	d = &__get_cpu_var(tlb_debug);
+	preempt_enable();
 
 	if (PageHighMem(pfn_to_page(pfn))) {
 		/* The buffer does not have a mapping.  Map it in and copy */
@@ -297,12 +374,18 @@ static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
 			dma_addr += sz;
 			offset = 0;
 		}
+		d->bounce_slow++;
 	} else {
-		if (dir == DMA_TO_DEVICE)
+		if (dir == DMA_TO_DEVICE) {
 			memcpy(dma_addr, phys_to_virt(phys), size);
-		else if (__copy_to_user_inatomic(phys_to_virt(phys),
+			d->bounce_to++;
+		}
+		else {
+			if (__copy_to_user_inatomic(phys_to_virt(phys),
 						 dma_addr, size))
 			/* inaccessible */;
+			d->bounce_from++;
+		}
 	}
 }
 
@@ -406,6 +489,16 @@ found:
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
 		swiotlb_bounce(phys, dma_addr, size, DMA_TO_DEVICE);
 
+	{
+		struct swiotlb_debug *d;
+		preempt_disable();
+		d = &__get_cpu_var(tlb_debug);
+		preempt_enable();
+		d->map++;
+		snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                	dev_driver_string(hwdev), dev_name(hwdev));
+	}
+
 	return dma_addr;
 }
 
@@ -453,6 +546,17 @@ do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
 			io_tlb_list[i] = ++count;
 	}
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
+
+	{
+		struct swiotlb_debug *d;
+		preempt_disable();
+		d = &__get_cpu_var(tlb_debug);
+		preempt_enable();
+		d->unmap++;
+		snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+                	dev_driver_string(hwdev), dev_name(hwdev));
+	}
+
 }
 
 static void
@@ -462,6 +566,14 @@ sync_single(struct device *hwdev, char *dma_addr, size_t size,
 	int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t phys = io_tlb_orig_addr[index];
 
+	struct swiotlb_debug *d;
+	preempt_disable();
+	d = &__get_cpu_var(tlb_debug);
+	preempt_enable();
+	d->sync++;
+	snprintf(d->dev_name, sizeof(d->dev_name), "%s %s",
+               	dev_driver_string(hwdev), dev_name(hwdev));
+
 	phys += ((unsigned long)dma_addr & ((1 << IO_TLB_SHIFT) - 1));
 
 	switch (target) {

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-14 22:07                       ` Konrad Rzeszutek Wilk
@ 2011-12-15 14:52                         ` Carsten Schiers
  2011-12-16 14:56                         ` Carsten Schiers
  1 sibling, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-12-15 14:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 296 bytes --]

...

> which will require some fiddling around.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.

 
Carsten.


[-- Attachment #1.2: Type: text/html, Size: 1166 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-14 22:07                       ` Konrad Rzeszutek Wilk
  2011-12-15 14:52                         ` Carsten Schiers
@ 2011-12-16 14:56                         ` Carsten Schiers
  2011-12-16 15:04                           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-12-16 14:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 789 bytes --]

Well, it will do nothing but print out “SWIOTLB is 0% full”.

 
Does that help? Or do you think something went wrong with the patch…

 
BR,

Carsten.

 
 
 
Von: Carsten Schiers 
Gesendet: Donnerstag, 15. Dezember 2011 15:53
An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell; lersek@redhat.com; xen-devel
Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)

 
...

> which will require some fiddling around.

Here is the patch I used against classic XenLinux. Any chance you could run
it with your classis guests and see what numbers you get?

Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.

 
Carsten.


[-- Attachment #1.2: Type: text/html, Size: 4946 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-16 14:56                         ` Carsten Schiers
@ 2011-12-16 15:04                           ` Konrad Rzeszutek Wilk
  2011-12-16 15:51                             ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-16 15:04 UTC (permalink / raw)
  To: Carsten Schiers; +Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell

On Fri, Dec 16, 2011 at 03:56:10PM +0100, Carsten Schiers wrote:
> Well, it will do nothing but print out “SWIOTLB is 0% full”.
> 
>  
> Does that help? Or do you think something went wrong with the patch…
> 

And you are using swiotlb=force on the 2.6.34 classic kernel and passing
in your budget-av card in it? Could you append the dmesg output please?


Thanks.
>  
> BR,
> 
> Carsten.
> 
>  
>  
>  
> Von: Carsten Schiers 
> Gesendet: Donnerstag, 15. Dezember 2011 15:53
> An: Konrad Rzeszutek Wilk; Konrad Rzeszutek Wilk
> Cc: linux@eikelenboom.it; zhenzhong.duan@oracle.com; Ian Campbell; lersek@redhat.com; xen-devel
> Betreff: AW: [Xen-devel] Load increase after memory upgrade (part2)
> 
>  
> ...
> 
> > which will require some fiddling around.
> 
> Here is the patch I used against classic XenLinux. Any chance you could run
> it with your classis guests and see what numbers you get?
> 
> Sure, it might take a bit, but I'll try it with my 2.6.34 classic kernel.
> 
>  
> Carsten.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-16 15:04                           ` Konrad Rzeszutek Wilk
@ 2011-12-16 15:51                             ` Carsten Schiers
  2011-12-16 16:19                               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2011-12-16 15:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

> And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?

Yes, two of them with swiotlb=32,force.


> Could you append the dmesg output please?

Attached. You find a "normal" boot after the one with the patched kernel.

Carsten.



[-- Attachment #2: dmesg.txt --]
[-- Type: application/octet-stream, Size: 30840 bytes --]

Dec 16 15:53:11 riker kernel: Linux version 2.6.34.7.1-xen-amd64 (root@chekotey) (gcc version 4.4.5 (Debian 4.4.5-8) ) #7 SMP Fri Dec 16 13:36:33 CET 2011
Dec 16 15:53:11 riker kernel: Command line: root=/dev/xvda1 ro swiotlb=32,force xencons=tty
Dec 16 15:53:11 riker kernel: Xen-provided physical RAM map:
Dec 16 15:53:11 riker kernel: Xen: 0000000000000000 - 0000000014800000 (usable)
Dec 16 15:53:11 riker kernel: NX (Execute Disable) protection: active
Dec 16 15:53:11 riker kernel: last_pfn = 0x14800 max_arch_pfn = 0x80000000
Dec 16 15:53:11 riker kernel: init_memory_mapping: 0000000000000000-0000000014800000
Dec 16 15:53:11 riker kernel: RAMDISK: 007fb000 - 01006000
Dec 16 15:53:11 riker kernel: ACPI in unprivileged domain disabled
Dec 16 15:53:11 riker kernel: Zone PFN ranges:
Dec 16 15:53:11 riker kernel:  DMA      0x00000000 -> 0x00001000
Dec 16 15:53:11 riker kernel:  DMA32    0x00001000 -> 0x00100000
Dec 16 15:53:11 riker kernel:  Normal   empty
Dec 16 15:53:11 riker kernel: Movable zone start PFN for each node
Dec 16 15:53:11 riker kernel: early_node_map[2] active PFN ranges
Dec 16 15:53:11 riker kernel:    0: 0x00000000 -> 0x00014000
Dec 16 15:53:11 riker kernel:    0: 0x00014800 -> 0x00014800
Dec 16 15:53:11 riker kernel: setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1
Dec 16 15:53:11 riker kernel: PERCPU: Embedded 17 pages/cpu @ffff88000100a000 s39656 r8192 d21784 u69632
Dec 16 15:53:11 riker kernel: pcpu-alloc: s39656 r8192 d21784 u69632 alloc=17*4096
Dec 16 15:53:11 riker kernel: pcpu-alloc: [0] 0 
Dec 16 15:53:11 riker kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 80772
Dec 16 15:53:11 riker kernel: Kernel command line: root=/dev/xvda1 ro swiotlb=32,force xencons=tty
Dec 16 15:53:11 riker kernel: PID hash table entries: 2048 (order: 2, 16384 bytes)
Dec 16 15:53:11 riker kernel: Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Dec 16 15:53:11 riker kernel: Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Dec 16 15:53:11 riker kernel: Software IO TLB enabled: 
Dec 16 15:53:11 riker kernel: Aperture:     32 megabytes
Dec 16 15:53:11 riker kernel: Address size: 28 bits
Dec 16 15:53:11 riker kernel: Kernel range: ffff8800016c3000 - ffff8800036c3000
Dec 16 15:53:11 riker kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Dec 16 15:53:11 riker kernel: Subtract (35 early reservations)
Dec 16 15:53:11 riker kernel:  #1 [00007fb000 - 0001006000]    Xen provided
Dec 16 15:53:11 riker kernel:  #2 [0000200000 - 00007daa94]   TEXT DATA BSS
Dec 16 15:53:11 riker kernel:  #3 [00010b6000 - 000115c000]         PGTABLE
Dec 16 15:53:11 riker kernel:  #4 [0014000000 - 0014800000]         BALLOON
Dec 16 15:53:11 riker kernel:  #5 [000115c000 - 00015d8000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #6 [00015d8000 - 00015d8008]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #7 [00015d8040 - 00015d81c0]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #8 [00015d81c0 - 00015d81e0]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #9 [00015d8200 - 00015db200]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #10 [00015dc000 - 00015dd000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #11 [00015dd000 - 00015de000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #12 [00015de000 - 00015df000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #13 [00015df000 - 0001683000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #14 [00010a6000 - 00010b6000]    Xen provided
Dec 16 15:53:11 riker kernel:  #15 [0001006000 - 0001006010]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #16 [0001006040 - 0001006048]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #17 [0001007000 - 0001008000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #18 [0001006080 - 00010060b0]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #19 [00010060c0 - 00010060f0]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #20 [000100a000 - 000101b000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #21 [0001006100 - 0001006108]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #22 [0001006140 - 0001006148]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #23 [0001006180 - 0001006184]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #24 [00010061c0 - 00010061c8]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #25 [0001006200 - 0001006300]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #26 [0001006300 - 0001006348]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #27 [0001006380 - 00010063c8]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #28 [000101b000 - 000101f000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #29 [000101f000 - 000109f000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #30 [0001683000 - 00016c3000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #31 [00016c3000 - 00036c3000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #32 [00036c3000 - 00036d3000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #33 [00036d3000 - 00036f3000]         BOOTMEM
Dec 16 15:53:11 riker kernel:  #34 [00036f3000 - 00036fb000]         BOOTMEM
Dec 16 15:53:11 riker kernel: Memory: 273592k/335872k available (3229k kernel code, 8192k absent, 54088k reserved, 1722k data, 320k init)
Dec 16 15:53:11 riker kernel: Hierarchical RCU implementation.
Dec 16 15:53:11 riker kernel: RCU-based detection of stalled CPUs is enabled.
Dec 16 15:53:11 riker kernel: NR_IRQS:1600
Dec 16 15:53:11 riker kernel: Xen reported: 2210.038 MHz processor.
Dec 16 15:53:11 riker kernel: Console: colour dummy device 80x25
Dec 16 15:53:11 riker kernel: console [tty0] enabled
Dec 16 15:53:11 riker kernel: console [tty-1] enabled
Dec 16 15:53:11 riker kernel: Calibrating delay using timer specific routine.. 4478.58 BogoMIPS (lpj=8957173)
Dec 16 15:53:11 riker kernel: Security Framework initialized
Dec 16 15:53:11 riker kernel: SELinux:  Disabled at boot.
Dec 16 15:53:11 riker kernel: Mount-cache hash table entries: 256
Dec 16 15:53:11 riker kernel: SMP alternatives: switching to UP code
Dec 16 15:53:11 riker kernel: Freeing SMP alternatives: 36k freed
Dec 16 15:53:11 riker kernel: Brought up 1 CPUs
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 16
Dec 16 15:53:11 riker kernel: Brought up 1 CPUs
Dec 16 15:53:11 riker kernel: PCI: setting up Xen PCI frontend stub
Dec 16 15:53:11 riker kernel: bio: create slab <bio-0> at 0
Dec 16 15:53:11 riker kernel: ACPI: Interpreter disabled.
Dec 16 15:53:11 riker kernel: vgaarb: loaded
Dec 16 15:53:11 riker kernel: xen_mem: Initialising balloon driver.
Dec 16 15:53:11 riker kernel: SCSI subsystem initialized
Dec 16 15:53:11 riker kernel: usbcore: registered new interface driver usbfs
Dec 16 15:53:11 riker kernel: usbcore: registered new interface driver hub
Dec 16 15:53:11 riker kernel: usbcore: registered new device driver usb
Dec 16 15:53:11 riker kernel: PCI: System does not support PCI
Dec 16 15:53:11 riker kernel: PCI: System does not support PCI
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 8
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 20
Dec 16 15:53:11 riker kernel: Switching to clocksource xen
Dec 16 15:53:11 riker kernel: pnp: PnP ACPI: disabled
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 2
Dec 16 15:53:11 riker kernel: IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
Dec 16 15:53:11 riker kernel: TCP established hash table entries: 16384 (order: 6, 262144 bytes)
Dec 16 15:53:11 riker kernel: TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
Dec 16 15:53:11 riker kernel: TCP: Hash tables configured (established 16384 bind 16384)
Dec 16 15:53:11 riker kernel: TCP reno registered
Dec 16 15:53:11 riker kernel: UDP hash table entries: 256 (order: 1, 8192 bytes)
Dec 16 15:53:11 riker kernel: UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 1
Dec 16 15:53:11 riker kernel: Trying to unpack rootfs image as initramfs...
Dec 16 15:53:11 riker kernel: pcifront pci-0: Installing PCI frontend
Dec 16 15:53:11 riker kernel: pcifront pci-0: Creating PCI Frontend Bus 0000:00
Dec 16 15:53:11 riker kernel: Freeing initrd memory: 8236k freed
Dec 16 15:53:11 riker kernel: platform rtc_cmos: registered platform RTC device (no PNP device found)
Dec 16 15:53:11 riker kernel: audit: initializing netlink socket (disabled)
Dec 16 15:53:11 riker kernel: type=2000 audit(1324047186.334:1): initialized
Dec 16 15:53:11 riker kernel: VFS: Disk quotas dquot_6.5.2
Dec 16 15:53:11 riker kernel: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Dec 16 15:53:11 riker kernel: NTFS driver 2.1.29 [Flags: R/W].
Dec 16 15:53:11 riker kernel: msgmni has been set to 656
Dec 16 15:53:11 riker kernel: alg: No test for stdrng (krng)
Dec 16 15:53:11 riker kernel: Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
Dec 16 15:53:11 riker kernel: io scheduler noop registered
Dec 16 15:53:11 riker kernel: io scheduler deadline registered
Dec 16 15:53:11 riker kernel: io scheduler cfq registered (default)
Dec 16 15:53:11 riker kernel: Linux agpgart interface v0.103
Dec 16 15:53:11 riker kernel: brd: module loaded
Dec 16 15:53:11 riker kernel: Xen virtual console successfully installed as tty1
Dec 16 15:53:11 riker kernel: Event-channel device installed.
Dec 16 15:53:11 riker kernel: blktap_device_init: blktap device major 254
Dec 16 15:53:11 riker kernel: blktap_ring_init: blktap ring major: 252
Dec 16 15:53:11 riker kernel: netfront: Initialising virtual ethernet driver.
Dec 16 15:53:11 riker kernel: xen-vbd: registered block device major 202
Dec 16 15:53:11 riker kernel: blkfront: xvda1: barriers enabled
Dec 16 15:53:11 riker kernel: blkfront: xvda2: barriers enabled
Dec 16 15:53:11 riker kernel: Setting capacity to 8388608
Dec 16 15:53:11 riker kernel: xvda1: detected capacity change from 0 to 4294967296
Dec 16 15:53:11 riker kernel: blkfront: xvdb1: barriers enabled
Dec 16 15:53:11 riker kernel: Setting capacity to 2097152
Dec 16 15:53:11 riker kernel: xvda2: detected capacity change from 0 to 1073741824
Dec 16 15:53:11 riker kernel: Setting capacity to 2930272002
Dec 16 15:53:11 riker kernel: xvdb1: detected capacity change from 0 to 1500299265024
Dec 16 15:53:11 riker kernel: usbcore: registered new interface driver usbback
Dec 16 15:53:11 riker kernel: PPP generic driver version 2.4.2
Dec 16 15:53:11 riker kernel: PNP: No PS/2 controller found. Probing ports directly.
Dec 16 15:53:11 riker kernel: mice: PS/2 mouse device common for all mice
Dec 16 15:53:11 riker kernel: TCP bic registered
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 17
Dec 16 15:53:11 riker kernel: PCI IO multiplexer device installed.
Dec 16 15:53:11 riker kernel: Freeing unused kernel memory: 320k freed
Dec 16 15:53:11 riker kernel: kjournald starting.  Commit interval 5 seconds
Dec 16 15:53:11 riker kernel: EXT3-fs (xvda1): mounted filesystem with writeback data mode
Dec 16 15:53:11 riker kernel: udev[177]: starting version 164
Dec 16 15:53:11 riker kernel: Linux video capture interface: v2.00
Dec 16 15:53:11 riker kernel: saa7146: register extension 'budget_av'.
Dec 16 15:53:11 riker kernel: budget_av 0000:00:00.0: enabling device (0000 -> 0002)
Dec 16 15:53:11 riker kernel: IRQ 17/: IRQF_DISABLED is not guaranteed on shared IRQs
Dec 16 15:53:11 riker kernel: saa7146: found saa7146 @ mem ffffc90000246000 (revision 1, irq 17) (0x1894,0x0028).
Dec 16 15:53:11 riker kernel: saa7146 (0): dma buffer size 1347584
Dec 16 15:53:11 riker kernel: DVB: registering new adapter (KNC1 DVB-C MK3)
Dec 16 15:53:11 riker kernel: adapter failed MAC signature check
Dec 16 15:53:11 riker kernel: encoded MAC from EEPROM was ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff
Dec 16 15:53:11 riker kernel: KNC1-0: MAC addr = 00:09:d6:6d:b3:0a
Dec 16 15:53:11 riker kernel: DVB: registering adapter 0 frontend 0 (Philips TDA10023 DVB-C)...
Dec 16 15:53:11 riker kernel: budget-av: ci interface initialised.
Dec 16 15:53:11 riker kernel: budget_av 0000:00:01.0: enabling device (0000 -> 0002)
Dec 16 15:53:11 riker kernel: IRQ 18/: IRQF_DISABLED is not guaranteed on shared IRQs
Dec 16 15:53:11 riker kernel: saa7146: found saa7146 @ mem ffffc900004de000 (revision 1, irq 18) (0x1894,0x002c).
Dec 16 15:53:11 riker kernel: saa7146 (1): dma buffer size 1347584
Dec 16 15:53:11 riker kernel: DVB: registering new adapter (Satelco EasyWatch DVB-C MK3)
Dec 16 15:53:11 riker kernel: adapter failed MAC signature check
Dec 16 15:53:11 riker kernel: encoded MAC from EEPROM was ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff
Dec 16 15:53:11 riker kernel: KNC1-1: MAC addr = 00:09:d6:6d:b0:33
Dec 16 15:53:11 riker kernel: DVB: registering adapter 1 frontend 0 (Philips TDA10023 DVB-C)...
Dec 16 15:53:11 riker kernel: budget-av: ci interface initialised.
Dec 16 15:53:11 riker kernel: Adding 1048572k swap on /dev/xvda2.  Priority:-1 extents:1 across:1048572k SS
Dec 16 15:53:11 riker kernel: EXT3-fs (xvda1): warning: maximal mount count reached, running e2fsck is recommended
Dec 16 15:53:11 riker kernel: EXT3-fs (xvda1): using internal journal
Dec 16 15:53:11 riker kernel: kjournald starting.  Commit interval 5 seconds
Dec 16 15:53:11 riker kernel: EXT3-fs (xvdb1): warning: maximal mount count reached, running e2fsck is recommended
Dec 16 15:53:11 riker kernel: EXT3-fs (xvdb1): using internal journal
Dec 16 15:53:11 riker kernel: EXT3-fs (xvdb1): mounted filesystem with writeback data mode
Dec 16 15:53:11 riker kernel: RPC: Registered udp transport module.
Dec 16 15:53:11 riker kernel: RPC: Registered tcp transport module.
Dec 16 15:53:11 riker kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Dec 16 15:53:11 riker kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Dec 16 15:53:11 riker kernel: NET: Registered protocol family 10
Dec 16 15:53:11 riker kernel: lo: Disabled Privacy Extensions
Dec 16 15:53:11 riker kernel: svc: failed to register lockdv1 RPC service (errno 97).
Dec 16 15:53:11 riker kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Dec 16 15:53:11 riker kernel: NFSD: starting 90-second grace period
Dec 16 15:53:29 riker kernel: Starting SWIOTLB debug thread.
Dec 16 15:53:29 riker kernel: swiotlb_start_thread: Go!
Dec 16 15:53:34 riker kernel: SWIOTLB is 0% full
Dec 16 15:53:39 riker kernel: SWIOTLB is 0% full
Dec 16 15:53:44 riker kernel: SWIOTLB is 0% full
Dec 16 15:53:49 riker kernel: SWIOTLB is 0% full
Dec 16 15:53:54 riker kernel: SWIOTLB is 0% full
Dec 16 15:53:59 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:04 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:09 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:14 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:19 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:24 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:29 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:34 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:39 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:44 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:49 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:54 riker kernel: SWIOTLB is 0% full
Dec 16 15:54:59 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:04 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:09 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:14 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:19 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:24 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:29 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:34 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:39 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:43 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:48 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:53 riker kernel: SWIOTLB is 0% full
Dec 16 15:55:58 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:03 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:08 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:13 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:18 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:23 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:28 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:33 riker kernel: SWIOTLB is 0% full
Dec 16 15:56:36 riker kernel: swiotlb_stop_thread: Stop!
Dec 16 15:56:36 riker kernel: SWIOTLB is 0% full
Dec 16 15:57:22 riker shutdown[1281]: shutting down for system halt
Dec 16 15:57:23 riker kernel: nfsd: last server has exited, flushing export cache
Dec 16 15:57:25 riker kernel: Kernel logging (proc) stopped.
Dec 16 15:57:25 riker rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="610" x-info="http://www.rsyslog.com"] exiting on signal 15.
Dec 16 15:58:00 riker kernel: imklog 4.6.4, log source = /proc/kmsg started.
Dec 16 15:58:00 riker rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="614" x-info="http://www.rsyslog.com"] (re)start

################################# normal boot ###########################################

Dec 16 15:58:00 riker kernel: Linux version 2.6.34.7.1-xen-amd64 (root@chekotey) (gcc version 4.4.5 (Debian 4.4.5-8) ) #6 SMP Wed Aug 17 11:49:53 CEST 2011
Dec 16 15:58:00 riker kernel: Command line: root=/dev/xvda1 ro swiotlb=32,force xencons=tty
Dec 16 15:58:00 riker kernel: Xen-provided physical RAM map:
Dec 16 15:58:00 riker kernel: Xen: 0000000000000000 - 0000000014800000 (usable)
Dec 16 15:58:00 riker kernel: NX (Execute Disable) protection: active
Dec 16 15:58:00 riker kernel: last_pfn = 0x14800 max_arch_pfn = 0x80000000
Dec 16 15:58:00 riker kernel: init_memory_mapping: 0000000000000000-0000000014800000
Dec 16 15:58:00 riker kernel: RAMDISK: 007fb000 - 01006000
Dec 16 15:58:00 riker kernel: ACPI in unprivileged domain disabled
Dec 16 15:58:00 riker kernel: Zone PFN ranges:
Dec 16 15:58:00 riker kernel:  DMA      0x00000000 -> 0x00001000
Dec 16 15:58:00 riker kernel:  DMA32    0x00001000 -> 0x00100000
Dec 16 15:58:00 riker kernel:  Normal   empty
Dec 16 15:58:00 riker kernel: Movable zone start PFN for each node
Dec 16 15:58:00 riker kernel: early_node_map[2] active PFN ranges
Dec 16 15:58:00 riker kernel:    0: 0x00000000 -> 0x00014000
Dec 16 15:58:00 riker kernel:    0: 0x00014800 -> 0x00014800
Dec 16 15:58:00 riker kernel: setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1
Dec 16 15:58:00 riker kernel: PERCPU: Embedded 17 pages/cpu @ffff88000100a000 s39592 r8192 d21848 u69632
Dec 16 15:58:00 riker kernel: pcpu-alloc: s39592 r8192 d21848 u69632 alloc=17*4096
Dec 16 15:58:00 riker kernel: pcpu-alloc: [0] 0 
Dec 16 15:58:00 riker kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 80772
Dec 16 15:58:00 riker kernel: Kernel command line: root=/dev/xvda1 ro swiotlb=32,force xencons=tty
Dec 16 15:58:00 riker kernel: PID hash table entries: 2048 (order: 2, 16384 bytes)
Dec 16 15:58:00 riker kernel: Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Dec 16 15:58:00 riker kernel: Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Dec 16 15:58:00 riker kernel: Software IO TLB enabled: 
Dec 16 15:58:00 riker kernel: Aperture:     32 megabytes
Dec 16 15:58:00 riker kernel: Address size: 28 bits
Dec 16 15:58:00 riker kernel: Kernel range: ffff8800016c3000 - ffff8800036c3000
Dec 16 15:58:00 riker kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Dec 16 15:58:00 riker kernel: Subtract (35 early reservations)
Dec 16 15:58:00 riker kernel:  #1 [00007fb000 - 0001006000]    Xen provided
Dec 16 15:58:00 riker kernel:  #2 [0000200000 - 00007daa94]   TEXT DATA BSS
Dec 16 15:58:00 riker kernel:  #3 [00010b6000 - 000115c000]         PGTABLE
Dec 16 15:58:00 riker kernel:  #4 [0014000000 - 0014800000]         BALLOON
Dec 16 15:58:00 riker kernel:  #5 [000115c000 - 00015d8000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #6 [00015d8000 - 00015d8008]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #7 [00015d8040 - 00015d81c0]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #8 [00015d81c0 - 00015d81e0]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #9 [00015d8200 - 00015db200]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #10 [00015dc000 - 00015dd000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #11 [00015dd000 - 00015de000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #12 [00015de000 - 00015df000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #13 [00015df000 - 0001683000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #14 [00010a6000 - 00010b6000]    Xen provided
Dec 16 15:58:00 riker kernel:  #15 [0001006000 - 0001006010]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #16 [0001006040 - 0001006048]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #17 [0001007000 - 0001008000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #18 [0001006080 - 00010060b0]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #19 [00010060c0 - 00010060f0]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #20 [000100a000 - 000101b000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #21 [0001006100 - 0001006108]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #22 [0001006140 - 0001006148]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #23 [0001006180 - 0001006184]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #24 [00010061c0 - 00010061c8]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #25 [0001006200 - 0001006300]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #26 [0001006300 - 0001006348]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #27 [0001006380 - 00010063c8]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #28 [000101b000 - 000101f000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #29 [000101f000 - 000109f000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #30 [0001683000 - 00016c3000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #31 [00016c3000 - 00036c3000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #32 [00036c3000 - 00036d3000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #33 [00036d3000 - 00036f3000]         BOOTMEM
Dec 16 15:58:00 riker kernel:  #34 [00036f3000 - 00036fb000]         BOOTMEM
Dec 16 15:58:00 riker kernel: Memory: 273592k/335872k available (3228k kernel code, 8192k absent, 54088k reserved, 1723k data, 320k init)
Dec 16 15:58:00 riker kernel: Hierarchical RCU implementation.
Dec 16 15:58:00 riker kernel: RCU-based detection of stalled CPUs is enabled.
Dec 16 15:58:00 riker kernel: NR_IRQS:1600
Dec 16 15:58:00 riker kernel: Xen reported: 2210.038 MHz processor.
Dec 16 15:58:00 riker kernel: Console: colour dummy device 80x25
Dec 16 15:58:00 riker kernel: console [tty0] enabled
Dec 16 15:58:00 riker kernel: console [tty-1] enabled
Dec 16 15:58:00 riker kernel: Calibrating delay using timer specific routine.. 4478.63 BogoMIPS (lpj=8957276)
Dec 16 15:58:00 riker kernel: Security Framework initialized
Dec 16 15:58:00 riker kernel: SELinux:  Disabled at boot.
Dec 16 15:58:00 riker kernel: Mount-cache hash table entries: 256
Dec 16 15:58:00 riker kernel: SMP alternatives: switching to UP code
Dec 16 15:58:00 riker kernel: Freeing SMP alternatives: 36k freed
Dec 16 15:58:00 riker kernel: Brought up 1 CPUs
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 16
Dec 16 15:58:00 riker kernel: Brought up 1 CPUs
Dec 16 15:58:00 riker kernel: PCI: setting up Xen PCI frontend stub
Dec 16 15:58:00 riker kernel: bio: create slab <bio-0> at 0
Dec 16 15:58:00 riker kernel: ACPI: Interpreter disabled.
Dec 16 15:58:00 riker kernel: vgaarb: loaded
Dec 16 15:58:00 riker kernel: xen_mem: Initialising balloon driver.
Dec 16 15:58:00 riker kernel: SCSI subsystem initialized
Dec 16 15:58:00 riker kernel: usbcore: registered new interface driver usbfs
Dec 16 15:58:00 riker kernel: usbcore: registered new interface driver hub
Dec 16 15:58:00 riker kernel: usbcore: registered new device driver usb
Dec 16 15:58:00 riker kernel: PCI: System does not support PCI
Dec 16 15:58:00 riker kernel: PCI: System does not support PCI
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 8
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 20
Dec 16 15:58:00 riker kernel: Switching to clocksource xen
Dec 16 15:58:00 riker kernel: pnp: PnP ACPI: disabled
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 2
Dec 16 15:58:00 riker kernel: IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
Dec 16 15:58:00 riker kernel: TCP established hash table entries: 16384 (order: 6, 262144 bytes)
Dec 16 15:58:00 riker kernel: TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
Dec 16 15:58:00 riker kernel: TCP: Hash tables configured (established 16384 bind 16384)
Dec 16 15:58:00 riker kernel: TCP reno registered
Dec 16 15:58:00 riker kernel: UDP hash table entries: 256 (order: 1, 8192 bytes)
Dec 16 15:58:00 riker kernel: UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 1
Dec 16 15:58:00 riker kernel: Trying to unpack rootfs image as initramfs...
Dec 16 15:58:00 riker kernel: Freeing initrd memory: 8236k freed
Dec 16 15:58:00 riker kernel: platform rtc_cmos: registered platform RTC device (no PNP device found)
Dec 16 15:58:00 riker kernel: audit: initializing netlink socket (disabled)
Dec 16 15:58:00 riker kernel: type=2000 audit(1324047475.312:1): initialized
Dec 16 15:58:00 riker kernel: VFS: Disk quotas dquot_6.5.2
Dec 16 15:58:00 riker kernel: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Dec 16 15:58:00 riker kernel: NTFS driver 2.1.29 [Flags: R/W].
Dec 16 15:58:00 riker kernel: msgmni has been set to 656
Dec 16 15:58:00 riker kernel: pcifront pci-0: Installing PCI frontend
Dec 16 15:58:00 riker kernel: alg: No test for stdrng (krng)
Dec 16 15:58:00 riker kernel: Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
Dec 16 15:58:00 riker kernel: io scheduler noop registered
Dec 16 15:58:00 riker kernel: io scheduler deadline registered
Dec 16 15:58:00 riker kernel: io scheduler cfq registered (default)
Dec 16 15:58:00 riker kernel: Linux agpgart interface v0.103
Dec 16 15:58:00 riker kernel: brd: module loaded
Dec 16 15:58:00 riker kernel: Xen virtual console successfully installed as tty1
Dec 16 15:58:00 riker kernel: Event-channel device installed.
Dec 16 15:58:00 riker kernel: pcifront pci-0: Creating PCI Frontend Bus 0000:00
Dec 16 15:58:00 riker kernel: blktap_device_init: blktap device major 254
Dec 16 15:58:00 riker kernel: blktap_ring_init: blktap ring major: 252
Dec 16 15:58:00 riker kernel: netfront: Initialising virtual ethernet driver.
Dec 16 15:58:00 riker kernel: xen-vbd: registered block device major 202
Dec 16 15:58:00 riker kernel: blkfront: xvda1: barriers enabled
Dec 16 15:58:00 riker kernel: blkfront: xvda2: barriers enabled
Dec 16 15:58:00 riker kernel: Setting capacity to 8388608
Dec 16 15:58:00 riker kernel: xvda1: detected capacity change from 0 to 4294967296
Dec 16 15:58:00 riker kernel: Setting capacity to 2097152
Dec 16 15:58:00 riker kernel: xvda2: detected capacity change from 0 to 1073741824
Dec 16 15:58:00 riker kernel: blkfront: xvdb1: barriers enabled
Dec 16 15:58:00 riker kernel: usbcore: registered new interface driver usbback
Dec 16 15:58:00 riker kernel: PPP generic driver version 2.4.2
Dec 16 15:58:00 riker kernel: PNP: No PS/2 controller found. Probing ports directly.
Dec 16 15:58:00 riker kernel: mice: PS/2 mouse device common for all mice
Dec 16 15:58:00 riker kernel: TCP bic registered
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 17
Dec 16 15:58:00 riker kernel: PCI IO multiplexer device installed.
Dec 16 15:58:00 riker kernel: Freeing unused kernel memory: 320k freed
Dec 16 15:58:00 riker kernel: kjournald starting.  Commit interval 5 seconds
Dec 16 15:58:00 riker kernel: EXT3-fs (xvda1): mounted filesystem with writeback data mode
Dec 16 15:58:00 riker kernel: udev[186]: starting version 164
Dec 16 15:58:00 riker kernel: Linux video capture interface: v2.00
Dec 16 15:58:00 riker kernel: saa7146: register extension 'budget_av'.
Dec 16 15:58:00 riker kernel: budget_av 0000:00:00.0: enabling device (0000 -> 0002)
Dec 16 15:58:00 riker kernel: IRQ 17/: IRQF_DISABLED is not guaranteed on shared IRQs
Dec 16 15:58:00 riker kernel: saa7146: found saa7146 @ mem ffffc9000023e000 (revision 1, irq 17) (0x1894,0x0028).
Dec 16 15:58:00 riker kernel: saa7146 (0): dma buffer size 1347584
Dec 16 15:58:00 riker kernel: DVB: registering new adapter (KNC1 DVB-C MK3)
Dec 16 15:58:00 riker kernel: adapter failed MAC signature check
Dec 16 15:58:00 riker kernel: encoded MAC from EEPROM was ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff
Dec 16 15:58:00 riker kernel: KNC1-0: MAC addr = 00:09:d6:6d:b3:0a
Dec 16 15:58:00 riker kernel: DVB: registering adapter 0 frontend 0 (Philips TDA10023 DVB-C)...
Dec 16 15:58:00 riker kernel: budget-av: ci interface initialised.
Dec 16 15:58:00 riker kernel: budget_av 0000:00:01.0: enabling device (0000 -> 0002)
Dec 16 15:58:00 riker kernel: IRQ 18/: IRQF_DISABLED is not guaranteed on shared IRQs
Dec 16 15:58:00 riker kernel: saa7146: found saa7146 @ mem ffffc900004d6000 (revision 1, irq 18) (0x1894,0x002c).
Dec 16 15:58:00 riker kernel: saa7146 (1): dma buffer size 1347584
Dec 16 15:58:00 riker kernel: DVB: registering new adapter (Satelco EasyWatch DVB-C MK3)
Dec 16 15:58:00 riker kernel: adapter failed MAC signature check
Dec 16 15:58:00 riker kernel: encoded MAC from EEPROM was ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff
Dec 16 15:58:00 riker kernel: KNC1-1: MAC addr = 00:09:d6:6d:b0:33
Dec 16 15:58:00 riker kernel: DVB: registering adapter 1 frontend 0 (Philips TDA10023 DVB-C)...
Dec 16 15:58:00 riker kernel: budget-av: ci interface initialised.
Dec 16 15:58:00 riker kernel: Adding 1048572k swap on /dev/xvda2.  Priority:-1 extents:1 across:1048572k SS
Dec 16 15:58:00 riker kernel: EXT3-fs (xvda1): warning: maximal mount count reached, running e2fsck is recommended
Dec 16 15:58:00 riker kernel: EXT3-fs (xvda1): using internal journal
Dec 16 15:58:00 riker kernel: kjournald starting.  Commit interval 5 seconds
Dec 16 15:58:00 riker kernel: EXT3-fs (xvdb1): warning: maximal mount count reached, running e2fsck is recommended
Dec 16 15:58:00 riker kernel: EXT3-fs (xvdb1): using internal journal
Dec 16 15:58:00 riker kernel: EXT3-fs (xvdb1): mounted filesystem with writeback data mode
Dec 16 15:58:00 riker kernel: RPC: Registered udp transport module.
Dec 16 15:58:00 riker kernel: RPC: Registered tcp transport module.
Dec 16 15:58:00 riker kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Dec 16 15:58:00 riker kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Dec 16 15:58:00 riker kernel: NET: Registered protocol family 10
Dec 16 15:58:00 riker kernel: lo: Disabled Privacy Extensions
Dec 16 15:58:00 riker kernel: svc: failed to register lockdv1 RPC service (errno 97).
Dec 16 15:58:00 riker kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Dec 16 15:58:00 riker kernel: NFSD: starting 90-second grace period

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-16 15:51                             ` Carsten Schiers
@ 2011-12-16 16:19                               ` Konrad Rzeszutek Wilk
  2011-12-17 22:12                                 ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-16 16:19 UTC (permalink / raw)
  To: Carsten Schiers; +Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
> 
> Yes, two of them with swiotlb=32,force.
> 
> 
> > Could you append the dmesg output please?
> 
> Attached. You find a "normal" boot after the one with the patched kernel.

Uh, what happens when you run the driver, meaning capture stuff. I remember with
the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix this.
> 
> Carsten.
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-16 16:19                               ` Konrad Rzeszutek Wilk
@ 2011-12-17 22:12                                 ` Carsten Schiers
  2011-12-18  0:19                                   ` Sander Eikelenboom
  2011-12-19 14:54                                   ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 66+ messages in thread
From: Carsten Schiers @ 2011-12-17 22:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux, xen-devel, lersek, zhenzhong.duan, Ian Campbell

OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Freitag, 16. Dezember 2011 17:19
An: Carsten Schiers
Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
> 
> Yes, two of them with swiotlb=32,force.
> 
> 
> > Could you append the dmesg output please?
> 
> Attached. You find a "normal" boot after the one with the patched kernel.

Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?

Thanks for being willing to be a guinea pig while trying to fix this.
> 
> Carsten.
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-17 22:12                                 ` Carsten Schiers
@ 2011-12-18  0:19                                   ` Sander Eikelenboom
  2011-12-19 14:56                                     ` Konrad Rzeszutek Wilk
  2011-12-19 14:54                                   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 66+ messages in thread
From: Sander Eikelenboom @ 2011-12-18  0:19 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Ian Campbell, xen-devel, lersek, zhenzhong.duan, Konrad Rzeszutek Wilk

I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
I'm not having much time now, hoping to get back with a full report soon.

--
Sander

Saturday, December 17, 2011, 11:12:45 PM, you wrote:

> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

> -----Ursprüngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
>> > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
>> 
>> Yes, two of them with swiotlb=32,force.
>> 
>> 
>> > Could you append the dmesg output please?
>> 
>> Attached. You find a "normal" boot after the one with the patched kernel.

> Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?

> Thanks for being willing to be a guinea pig while trying to fix this.
>> 
>> Carsten.
>> 
>> 



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel




-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-17 22:12                                 ` Carsten Schiers
  2011-12-18  0:19                                   ` Sander Eikelenboom
@ 2011-12-19 14:54                                   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-19 14:54 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: xen-devel, Ian Campbell, Konrad Rzeszutek Wilk, zhenzhong.duan,
	linux, lersek

On Sat, Dec 17, 2011 at 11:12:45PM +0100, Carsten Schiers wrote:
> OK, double checked. Both PCI cards enabled, running, working, but nothing but "SWIOTLB is 0% full". Any chance
> to check that the patch is working? Does it print out something else with your setting? BR, Carsten.

Hm, and with the pvops you got some numbers along with tons of 'bounce'.

The one thing that I neglected in this patch is the alloc_coherent
part.. which I don't thing is that important as we did show that the
alloc buffers are used.

I don't have anything concrete yet, but after the holidays should have a
better idea of what is happening. Thanks for  being willing to test
this!
> 
> -----Urspr?ngliche Nachricht-----
> Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
> Gesendet: Freitag, 16. Dezember 2011 17:19
> An: Carsten Schiers
> Cc: linux@eikelenboom.it; xen-devel; lersek@redhat.com; zhenzhong.duan@oracle.com; Ian Campbell
> Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)
> 
> On Fri, Dec 16, 2011 at 04:51:47PM +0100, Carsten Schiers wrote:
> > > And you are using swiotlb=force on the 2.6.34 classic kernel and passing in your budget-av card in it?
> > 
> > Yes, two of them with swiotlb=32,force.
> > 
> > 
> > > Could you append the dmesg output please?
> > 
> > Attached. You find a "normal" boot after the one with the patched kernel.
> 
> Uh, what happens when you run the driver, meaning capture stuff. I remember with the pvops you had about ~30K or so of bounces, but not sure about the bootup?
> 
> Thanks for being willing to be a guinea pig while trying to fix this.
> > 
> > Carsten.
> > 
> > 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-18  0:19                                   ` Sander Eikelenboom
@ 2011-12-19 14:56                                     ` Konrad Rzeszutek Wilk
  2012-01-10 21:55                                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-19 14:56 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: xen-devel, Ian Campbell, Konrad Rzeszutek Wilk, Carsten Schiers,
	zhenzhong.duan, lersek

On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
> I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> I'm not having much time now, hoping to get back with a full report soon.

Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
when running as PV guest .. Will look in more details after the
holidays. Thanks for being willing to try it out.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2011-12-19 14:56                                     ` Konrad Rzeszutek Wilk
@ 2012-01-10 21:55                                       ` Konrad Rzeszutek Wilk
  2012-01-12 22:06                                         ` Sander Eikelenboom
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-10 21:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Ian Campbell, lersek, zhenzhong.duan,
	Sander Eikelenboom, Carsten Schiers

On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:
> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> > I'm not having much time now, hoping to get back with a full report soon.
> 
> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
> when running as PV guest .. Will look in more details after the
> holidays. Thanks for being willing to try it out.

Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:

[  771.896140] SWIOTLB is 11% full
[  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
[  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
[  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0

but interestingly enough, if I boot the guest as the first one I do not get these bounce
requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
numbers.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-10 21:55                                       ` Konrad Rzeszutek Wilk
@ 2012-01-12 22:06                                         ` Sander Eikelenboom
  2012-01-13  8:12                                           ` Jan Beulich
  2012-01-13 15:13                                           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 66+ messages in thread
From: Sander Eikelenboom @ 2012-01-12 22:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

Hello Konrad,

Tuesday, January 10, 2012, 10:55:33 PM, you wrote:

> On Mon, Dec 19, 2011 at 10:56:09AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Sun, Dec 18, 2011 at 01:19:16AM +0100, Sander Eikelenboom wrote:
>> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
>> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
>> > I'm not having much time now, hoping to get back with a full report soon.
>> 
>> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
>> when running as PV guest .. Will look in more details after the
>> holidays. Thanks for being willing to try it out.

> Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:

> [  771.896140] SWIOTLB is 11% full
> [  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
> [  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
> [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0

> but interestingly enough, if I boot the guest as the first one I do not get these bounce
> requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
> numbers.


I started to expiriment some more with what i encountered.

On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
It was showing "12% full".
Checking in sysfs shows:
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
32

If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?
Could that change cause the need for all devices to need bounce buffering  and could it therefore explain some people seeing more cpu usage for dom0 ?

I have forced my r8169 to use 64bits dma mask (using use_dac=1)
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
32
serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
64

This results in dump-swiotlb reporting:

[ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
[ 1265.625043] SWIOTLB is 0% full
[ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
[ 1270.635024] SWIOTLB is 0% full
[ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
[ 1275.644261] SWIOTLB is 0% full
[ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10



So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?


Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

(oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )


--
Sander

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-12 22:06                                         ` Sander Eikelenboom
@ 2012-01-13  8:12                                           ` Jan Beulich
  2012-01-13 15:13                                           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2012-01-13  8:12 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel, Konrad Rzeszutek Wilk

>>> On 12.01.12 at 23:06, Sander Eikelenboom <linux@eikelenboom.it> wrote:
> Another thing i was wondering about, couldn't the hypervisor offer a small 
> window in 32bit addressable mem to all (or only when pci passthrough is used) 
> domU's to be used for DMA ?

How would use of such a range be arbitrated/protected? You'd have to
ask for reservation (aka allocation) of a chunk anyway, which is as good
as using the existing interfaces to obtain address restricted memory
(and the hypervisor has a [rudimentary] mechanism to preserve some
low memory for DMA allocations).

Jan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-12 22:06                                         ` Sander Eikelenboom
  2012-01-13  8:12                                           ` Jan Beulich
@ 2012-01-13 15:13                                           ` Konrad Rzeszutek Wilk
  2012-01-15 11:32                                             ` Sander Eikelenboom
  1 sibling, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-13 15:13 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
> >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
> >> > I'm not having much time now, hoping to get back with a full report soon.
> >> 
> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
> >> when running as PV guest .. Will look in more details after the
> >> holidays. Thanks for being willing to try it out.
> 
> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:
> 
> > [  771.896140] SWIOTLB is 11% full
> > [  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
> > [  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
> > [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0
> 
> > but interestingly enough, if I boot the guest as the first one I do not get these bounce
> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
> > numbers.
> 
> 
> I started to expiriment some more with what i encountered.
> 
> On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
> It was showing "12% full".
> Checking in sysfs shows:
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 32
> 
> If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?

? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are
referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them
had not really changed.

> Could that change cause the need for all devices to need bounce buffering  and could it therefore explain some people seeing more cpu usage for dom0 ?

The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests.
And that the older domU's (XenOLinux) do not have this.

That I can't understand - the implementation in both cases _looks_ to do the same thing.
There was one issue I found in the upstream one, but even with that fix I still
get that "bounce" usage in domU.

Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple
times before I get this. Which leads me to believe this is not a kernel issue but that we
are simply fragmented the Xen memory so much, so that when it launches the guest all of the
memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of
memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB
swizzles some memory under the 4GB , and this is where we get the bounce buffer effect
(as the memory from 4GB is then copied to the memory 12GB->16GB).

But it does not explain why on the first couple of starts I did not see this with pvops.
And it does not seem to happen with the XenOLinux kernel, so there must be something else
in here.

> 
> I have forced my r8169 to use 64bits dma mask (using use_dac=1)

Ah yes.
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
> 32
> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
> 64
> 
> This results in dump-swiotlb reporting:
> 
> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
> [ 1265.625043] SWIOTLB is 0% full
> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
> [ 1270.635024] SWIOTLB is 0% full
> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
> [ 1275.644261] SWIOTLB is 0% full
> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10

Which is what we expect. No need to bounce since the PCI adapter can reach memory
above the 4GB mark.

> 
> 
> 
> So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?

The bouncing can happen due to two cases:
 - Memory is above 4GB
 - Memory crosses a page-boundary (rarely happens).
> 
> 
> Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool.
But it can't do it for every part of memory. That is why there are DMA pools
which are used by graphics adapters, video capture devices,storage and network
drivers. They are used for small packet sizes so that the driver does not have
to allocate DMA buffers when it gets a 100bytes ping response. But for large
packets (say that ISO file you are downloading) it allocates memory on the fly
and "maps" it into the PCI space using the DMA API. That "mapping" sets up
an "physical memory" -> "guest memory" translation - and if that allocated
memory is above 4GB, part of this mapping is to copy ("bounce") the memory
under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
can physically fetch/put the data. Once that is completed it is "sync"-ed
back, which is bouncing that data to the "allocated memory".

So having a DMA pool is very good - and most drivers use it. The thing I can't
figure out is:
 - why the DVB do not seem to use it, even thought they look to use the videobuf_dma
   driver.
 - why the XenOLinux does not seem to have this problem (and this might be false - 
   perhaps it does have this problem and it just takes a couple of guest launches,
   destructions, starts, etc to actually see it).
 - are there any flags in the domain builder to say: "ok, this domain is going to
   service 32-bit cards, hence build the memory from 0->4GB". This seems like
   a good know at first, but it probably is a bad idea (imagine using it by mistake
   on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so
   it would not be that important in the future.
> 
> (oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )

Nonsense. You were on the correct path . Hopefully the level of details hasn't
scared you off now :-)

> 
> 
> --
> Sander
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-13 15:13                                           ` Konrad Rzeszutek Wilk
@ 2012-01-15 11:32                                             ` Sander Eikelenboom
  2012-01-17 21:02                                               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Sander Eikelenboom @ 2012-01-15 11:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


Friday, January 13, 2012, 4:13:07 PM, you wrote:

>> >> > I also have done some experiments with the patch, in domU i also get the 0% full for my usb controllers with video grabbers , in dom0 my i get 12% full, both my realtek 8169 ethernet controllers seem to use the bounce buffering ...
>> >> > And that with a iommu (amd) ? it all seems kind of strange, although it is also working ...
>> >> > I'm not having much time now, hoping to get back with a full report soon.
>> >> 
>> >> Hm, so domU nothing, but dom0 it reports. Maybe the patch is incorrect
>> >> when running as PV guest .. Will look in more details after the
>> >> holidays. Thanks for being willing to try it out.
>> 
>> > Good news is I am able to reproduce this with my 32-bit NIC with 3.2 domU:
>> 
>> > [  771.896140] SWIOTLB is 11% full
>> > [  776.896116] 0 [e1000 0000:00:00.0] bounce: from:222028(slow:0)to:2 map:222037 unmap:227220 sync:0
>> > [  776.896126] 1 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:5188 map:5188 unmap:0 sync:0
>> > [  776.896133] 3 [e1000 0000:00:00.0] bounce: from:0(slow:0)to:1 map:1 unmap:0 sync:0
>> 
>> > but interestingly enough, if I boot the guest as the first one I do not get these bounce
>> > requests. I will shortly bootup a Xen-O-Linux kernel and see if I get these same
>> > numbers.
>> 
>> 
>> I started to expiriment some more with what i encountered.
>> 
>> On dom0 i was seeing that my r8169 ethernet controllers where using bounce buffering with the dump-swiotlb module.
>> It was showing "12% full".
>> Checking in sysfs shows:
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 32
>> 
>> If i remember correctly wasn't the allocation for dom0 changed to be to the top of memory instead of low .. somewhere between 2.6.32 and 3.0 ?

> ? We never actually had dom0 support in the upstream kernel until 2.6.37.. The 2.6.32<->2.6.36 you are
> referring to must have been the trees that I spun up - but the implementation of SWIOTLB in them
> had not really changed.

>> Could that change cause the need for all devices to need bounce buffering  and could it therefore explain some people seeing more cpu usage for dom0 ?

> The issue I am seeing is not CPU usage in dom0, but rather the CPU usage in domU with guests.
> And that the older domU's (XenOLinux) do not have this.

> That I can't understand - the implementation in both cases _looks_ to do the same thing.
> There was one issue I found in the upstream one, but even with that fix I still
> get that "bounce" usage in domU.

> Interestingly enough, I get that only if I have launched, destroyed, launched, etc, the guest multiple
> times before I get this. Which leads me to believe this is not a kernel issue but that we
> are simply fragmented the Xen memory so much, so that when it launches the guest all of the
> memory is above 4GB. But that seems counter-intuive as by default Xen starts guests at the far end of
> memory (so on my 16GB box it would stick a 4GB guest at 12GB->16GB roughly). The SWIOTLB
> swizzles some memory under the 4GB , and this is where we get the bounce buffer effect
> (as the memory from 4GB is then copied to the memory 12GB->16GB).

> But it does not explain why on the first couple of starts I did not see this with pvops.
> And it does not seem to happen with the XenOLinux kernel, so there must be something else
> in here.

>> 
>> I have forced my r8169 to use 64bits dma mask (using use_dac=1)

> Ah yes.
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat consistent_dma_mask_bits
>> 32
>> serveerstertje:/sys/bus/pci/devices/0000:09:00.0# cat dma_mask_bits
>> 64
>> 
>> This results in dump-swiotlb reporting:
>> 
>> [ 1265.616106] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
>> [ 1265.625043] SWIOTLB is 0% full
>> [ 1270.626085] 0 [r8169 0000:08:00.0] bounce: from:6(slow:0)to:0 map:0 unmap:0 sync:12
>> [ 1270.635024] SWIOTLB is 0% full
>> [ 1275.635091] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10
>> [ 1275.644261] SWIOTLB is 0% full
>> [ 1280.654097] 0 [r8169 0000:09:00.0] bounce: from:5(slow:0)to:0 map:0 unmap:0 sync:10

> Which is what we expect. No need to bounce since the PCI adapter can reach memory
> above the 4GB mark.

>> 
>> 
>> 
>> So it has changed from 12% to 0%, although it still reports something about bouncing ? or am i mis interpreting stuff ?

> The bouncing can happen due to two cases:
>  - Memory is above 4GB
>  - Memory crosses a page-boundary (rarely happens).
>> 
>> 
>> Another thing i was wondering about, couldn't the hypervisor offer a small window in 32bit addressable mem to all (or only when pci passthrough is used) domU's to be used for DMA ?

> It does. That is what the Xen SWIOTLB does with "swizzling" the pages in its pool.
> But it can't do it for every part of memory. That is why there are DMA pools
> which are used by graphics adapters, video capture devices,storage and network
> drivers. They are used for small packet sizes so that the driver does not have
> to allocate DMA buffers when it gets a 100bytes ping response. But for large
> packets (say that ISO file you are downloading) it allocates memory on the fly
> and "maps" it into the PCI space using the DMA API. That "mapping" sets up
> an "physical memory" -> "guest memory" translation - and if that allocated
> memory is above 4GB, part of this mapping is to copy ("bounce") the memory
> under the 4GB (where XenSWIOTLB has allocated a pool), so that the adapter
> can physically fetch/put the data. Once that is completed it is "sync"-ed
> back, which is bouncing that data to the "allocated memory".


> So having a DMA pool is very good - and most drivers use it. The thing I can't
> figure out is:
>  - why the DVB do not seem to use it, even thought they look to use the videobuf_dma
>    driver.
>  - why the XenOLinux does not seem to have this problem (and this might be false - 
>    perhaps it does have this problem and it just takes a couple of guest launches,
>    destructions, starts, etc to actually see it).
>  - are there any flags in the domain builder to say: "ok, this domain is going to
>    service 32-bit cards, hence build the memory from 0->4GB". This seems like
>    a good know at first, but it probably is a bad idea (imagine using it by mistake
>    on every guest). And also nowadays most cards are PCIe and they can do 64-bit, so
>    it would not be that important in the future.
>> 
>> (oh yes, i haven't got i clue what i'm talking about ... so it probably make no sense at all :-) )

> Nonsense. You were on the correct path . Hopefully the level of details hasn't
> scared you off now :-)

Well it only gives some more questions :-)
The thing is, pci passthrough and especially the DMA part of it, all work behind the scenes without giving much output about the way it is actually working.

The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests.
When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going.

                (HV_START_LOW 18446603336221196288)
                (FEATURES '!writable_page_tables|pae_pgdir_above_4gb')
                (VIRT_BASE 18446744071562067968)
                (GUEST_VERSION 2.6)
                (PADDR_OFFSET 0)
                (GUEST_OS linux)
                (HYPERCALL_PAGE 18446744071578849280)
                (LOADER generic)
                (SUSPEND_CANCEL 1)
                (PAE_MODE yes)
                (ENTRY 18446744071594476032)
                (XEN_VERSION xen-3.0)

Still i only see:

[   47.449072] Starting SWIOTLB debug thread.
[   47.449090] swiotlb_start_thread: Go!
[   47.449262] xen_swiotlb_start_thread: Go!
[   52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0
[   52.449180] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:16 map:23 unmap:0 sync:0
[   52.449187] 2 [ohci_hcd 0000:0a:00.4] bounce: from:0(slow:0)to:4 map:5 unmap:0 sync:0
[   52.449226] SWIOTLB is 0% full
[   57.449180] 0 ehci_hcd 0000:0a:00.3 alloc coherent: 35, free: 0
[   57.449219] 1 ohci_hcd 0000:0a:00.6 alloc coherent: 1, free: 0
[   57.449265] SWIOTLB is 0% full
[   62.449176] SWIOTLB is 0% full
[   67.449336] SWIOTLB is 0% full
[   72.449279] SWIOTLB is 0% full
[   77.449121] SWIOTLB is 0% full
[   82.449236] SWIOTLB is 0% full
[   87.449242] SWIOTLB is 0% full
[   92.449241] SWIOTLB is 0% full
[  172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0
[  172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0
[  172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0
[  172.449170] SWIOTLB is 0% full
[  177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0
[  177.449131] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:76 map:112 unmap:0 sync:0
[  177.449138] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:4 map:6 unmap:0 sync:0
[  177.449178] SWIOTLB is 0% full
[  182.449143] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5349(slow:0)to:563 map:5899 unmap:5949 sync:0
[  182.449157] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:27 map:35 unmap:0 sync:0
[  182.449164] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:15 unmap:0 sync:0
[  182.449204] SWIOTLB is 0% full
[  187.449112] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5375(slow:0)to:592 map:5941 unmap:6022 sync:0
[  187.449126] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:46 map:69 unmap:0 sync:0
[  187.449133] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:9 map:12 unmap:0 sync:0
[  187.449173] SWIOTLB is 0% full
[  192.449183] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5360(slow:0)to:556 map:5890 unmap:5978 sync:0
[  192.449226] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:52 map:74 unmap:0 sync:0
[  192.449234] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:10 map:14 unmap:0 sync:0
[  192.449275] SWIOTLB is 0% full

And the devices do work ... so how does that work ...

Thx for your explanation so far !

--
Sander







>> 
>> 
>> --
>> Sander
>> 
>> 



-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-15 11:32                                             ` Sander Eikelenboom
@ 2012-01-17 21:02                                               ` Konrad Rzeszutek Wilk
  2012-01-18 11:28                                                 ` Pasi Kärkkäinen
  2012-01-18 11:35                                                 ` Jan Beulich
  0 siblings, 2 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-17 21:02 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

> The thing i was wondering about is if my AMD IOMMU is actually doing something for PV guests.
> When booting with iommu=off machine has 8GB mem, dom0 limited to 1024M and just starting one domU with iommu=soft, with pci-passthrough and the USB pci-cards with USB videograbbers attached to it, i would expect to find some bounce buffering going.
> 
>                 (HV_START_LOW 18446603336221196288)
>                 (FEATURES '!writable_page_tables|pae_pgdir_above_4gb')
>                 (VIRT_BASE 18446744071562067968)
>                 (GUEST_VERSION 2.6)
>                 (PADDR_OFFSET 0)
>                 (GUEST_OS linux)
>                 (HYPERCALL_PAGE 18446744071578849280)
>                 (LOADER generic)
>                 (SUSPEND_CANCEL 1)
>                 (PAE_MODE yes)
>                 (ENTRY 18446744071594476032)
>                 (XEN_VERSION xen-3.0)
> 
> Still i only see:
> 
> [   47.449072] Starting SWIOTLB debug thread.
> [   47.449090] swiotlb_start_thread: Go!
> [   47.449262] xen_swiotlb_start_thread: Go!
> [   52.449158] 0 [ehci_hcd 0000:0a:00.3] bounce: from:432(slow:0)to:1329 map:1756 unmap:1781 sync:0

There is bouncing there.
..
> [  172.449102] 0 [ehci_hcd 0000:0a:00.7] bounce: from:3839(slow:0)to:664 map:4486 unmap:4617 sync:0

And there.. 3839 of them.
> [  172.449123] 1 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:82 map:111 unmap:0 sync:0
> [  172.449130] 2 [ehci_hcd 0000:0a:00.7] bounce: from:0(slow:0)to:32 map:36 unmap:0 sync:0
> [  172.449170] SWIOTLB is 0% full
> [  177.449109] 0 [ehci_hcd 0000:0a:00.7] bounce: from:5348(slow:0)to:524 map:5834 unmap:5952 sync:0

And 5348 here!

So bounce-buffering is definitly happening with this guest.
.. snip..
> 
> And the devices do work ... so how does that work ...

Most (all?) drivers are written to work with bounce-buffering.
That has never been a problem.

The issue as I understand is that the DVB drivers allocate their buffers
from 0->4GB most (all the time?) so they never have to do bounce-buffering.

While the pv-ops one ends up quite frequently doing the bounce-buffering, which
implies that the DVB drivers end up allocating their buffers above the 4GB.
This means we end up spending some CPU time (in the guest) copying the memory
from >4GB to 0-4GB region (And vice-versa).

And I am not clear why this is happening. Hence my thought
was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
same) with the same PCI device (and the test would entail rebooting the
box in between the launches) to confirm that the Xen-O-Linux is doing something
that the PVOPS is not.

So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
so :-(

> 
> Thx for your explanation so far !

Sure thing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-17 21:02                                               ` Konrad Rzeszutek Wilk
@ 2012-01-18 11:28                                                 ` Pasi Kärkkäinen
  2012-01-18 11:39                                                   ` Jan Beulich
  2012-01-18 11:35                                                 ` Jan Beulich
  1 sibling, 1 reply; 66+ messages in thread
From: Pasi Kärkkäinen @ 2012-01-18 11:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Sander Eikelenboom, xen-devel

On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:
> > 
> > And the devices do work ... so how does that work ...
> 
> Most (all?) drivers are written to work with bounce-buffering.
> That has never been a problem.
> 
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> 
> While the pv-ops one ends up quite frequently doing the bounce-buffering, which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the memory
> from >4GB to 0-4GB region (And vice-versa).
> 
> And I am not clear why this is happening. Hence my thought
> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
> same) with the same PCI device (and the test would entail rebooting the
> box in between the launches) to confirm that the Xen-O-Linux is doing something
> that the PVOPS is not.
> 
> So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
> so :-(
> 

Did you try downloading a binary rpm (or src.rpm) from OpenSuse? 
I think they have 2.6.38 xenlinux kernel available.

-- Pasi

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-17 21:02                                               ` Konrad Rzeszutek Wilk
  2012-01-18 11:28                                                 ` Pasi Kärkkäinen
@ 2012-01-18 11:35                                                 ` Jan Beulich
  2012-01-18 14:29                                                   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2012-01-18 11:35 UTC (permalink / raw)
  To: Sander Eikelenboom, Konrad Rzeszutek Wilk; +Cc: xen-devel

>>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> The issue as I understand is that the DVB drivers allocate their buffers
> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> 
> While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> which
> implies that the DVB drivers end up allocating their buffers above the 4GB.
> This means we end up spending some CPU time (in the guest) copying the 
> memory
> from >4GB to 0-4GB region (And vice-versa).

This reminds me of something (not sure what XenoLinux you use for
comparison) - how are they allocating that memory? Not vmalloc_32()
by chance (I remember having seen numerous uses under - iirc -
drivers/media/)?

Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
what their (driver) callers might expect in a PV guest (including the
contiguity assumption for the latter, recalling that you earlier said
you were able to see the problem after several guest starts), and I
had put into our kernels an adjustment to make vmalloc_32() actually
behave as expected.

Jan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-18 11:28                                                 ` Pasi Kärkkäinen
@ 2012-01-18 11:39                                                   ` Jan Beulich
  0 siblings, 0 replies; 66+ messages in thread
From: Jan Beulich @ 2012-01-18 11:39 UTC (permalink / raw)
  To: Pasi Kärkkäinen, Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel

>>> On 18.01.12 at 12:28, Pasi Kärkkäinen<pasik@iki.fi> wrote:
> On Tue, Jan 17, 2012 at 04:02:25PM -0500, Konrad Rzeszutek Wilk wrote:
>> > 
>> > And the devices do work ... so how does that work ...
>> 
>> Most (all?) drivers are written to work with bounce-buffering.
>> That has never been a problem.
>> 
>> The issue as I understand is that the DVB drivers allocate their buffers
>> from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>> 
>> While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> which
>> implies that the DVB drivers end up allocating their buffers above the 4GB.
>> This means we end up spending some CPU time (in the guest) copying the 
> memory
>> from >4GB to 0-4GB region (And vice-versa).
>> 
>> And I am not clear why this is happening. Hence my thought
>> was to run an Xen-O-Linux kernel v2.6.3X and a PVOPS v2.6.3X (where X is the
>> same) with the same PCI device (and the test would entail rebooting the
>> box in between the launches) to confirm that the Xen-O-Linux is doing 
> something
>> that the PVOPS is not.
>> 
>> So far, I've haven't had much luck compiling a Xen-O-Linux v2.6.38 kernel
>> so :-(
>> 
> 
> Did you try downloading a binary rpm (or src.rpm) from OpenSuse? 
> I think they have 2.6.38 xenlinux kernel available.

openSUSE 11.4 is using 2.6.37; 12.1 is on 3.1 (and SLE is on 3.0).
Pulling out (consistent) patches at 2.6.38 level might be a little
involved.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-18 11:35                                                 ` Jan Beulich
@ 2012-01-18 14:29                                                   ` Konrad Rzeszutek Wilk
  2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-18 14:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Sander Eikelenboom, xen-devel, Konrad Rzeszutek Wilk

On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > The issue as I understand is that the DVB drivers allocate their buffers
> > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > 
> > While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> > which
> > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > This means we end up spending some CPU time (in the guest) copying the 
> > memory
> > from >4GB to 0-4GB region (And vice-versa).
> 
> This reminds me of something (not sure what XenoLinux you use for
> comparison) - how are they allocating that memory? Not vmalloc_32()

I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
I am going to look at the 2.6.38 from OpenSuSE.

> by chance (I remember having seen numerous uses under - iirc -
> drivers/media/)?
> 
> Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> what their (driver) callers might expect in a PV guest (including the
> contiguity assumption for the latter, recalling that you earlier said
> you were able to see the problem after several guest starts), and I
> had put into our kernels an adjustment to make vmalloc_32() actually
> behave as expected.

Aaah.. The plot thickens! Let me look in the sources! Thanks for the
pointer.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-18 14:29                                                   ` Konrad Rzeszutek Wilk
@ 2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
  2012-01-24  8:58                                                       ` Jan Beulich
                                                                         ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-23 22:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Sander Eikelenboom, xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their buffers
> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> > > which
> > > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying the 
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn't actually tested it and sadly
won't until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad

[-- Attachment #2: vmalloc --]
[-- Type: text/plain, Size: 3726 bytes --]

commit 0b5428f4a22be4855b5f03aa1369f9e30e095014
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Mon Jan 23 15:52:01 2012 -0500

    vmalloc_sg: make sure all pages in vmalloc area are really DMA-ready
    
    Under Xen, vmalloc_32() isn't guaranteed to return pages which are really
    under 4G in machine physical addresses (only in virtual pseudo-physical
    addresses).  To work around this, implement a vmalloc variant which
    allocates each page with dma_alloc_coherent() to guarantee that each
    page is suitable for the device in question.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/drivers/media/video/videobuf-dma-sg.c b/drivers/media/video/videobuf-dma-sg.c
index f300dea..3da2428 100644
--- a/drivers/media/video/videobuf-dma-sg.c
+++ b/drivers/media/video/videobuf-dma-sg.c
@@ -211,13 +211,36 @@ EXPORT_SYMBOL_GPL(videobuf_dma_init_user);
 int videobuf_dma_init_kernel(struct videobuf_dmabuf *dma, int direction,
 			     int nr_pages)
 {
+	int i;
+
 	dprintk(1, "init kernel [%d pages]\n", nr_pages);
 
 	dma->direction = direction;
-	dma->vaddr = vmalloc_32(nr_pages << PAGE_SHIFT);
+	dma->vaddr_pages = kcalloc(nr_pages, sizeof(*dma->vaddr_pages),
+				   GFP_KERNEL);
+	if (!dma->vaddr_pages)
+		return -ENOMEM;
+
+	dma->dma_addr = kcalloc(nr_pages, sizeof(*dma->dma_addr), GFP_KERNEL);
+	if (!dma->dma_addr) {
+		kfree(dma->vaddr_pages);
+		return -ENOMEM;
+	}
+	for (i = 0; i < nr_pages; i++) {
+		void *addr;
+
+		addr = dma_alloc_coherent(dma->dev, PAGE_SIZE,
+					  &(dma->dma_addr[i]), GFP_KERNEL);
+		if (addr == NULL)
+			goto out_free_pages;
+
+		dma->vaddr_pages[i] = virt_to_page(addr);
+	}
+	dma->vaddr = vmap(dma->vaddr_pages, nr_pages, VM_MAP | VM_IOREMAP,
+			  PAGE_KERNEL);
 	if (NULL == dma->vaddr) {
 		dprintk(1, "vmalloc_32(%d pages) failed\n", nr_pages);
-		return -ENOMEM;
+		goto out_free_pages;
 	}
 
 	dprintk(1, "vmalloc is at addr 0x%08lx, size=%d\n",
@@ -228,6 +251,18 @@ int videobuf_dma_init_kernel(struct videobuf_dmabuf *dma, int direction,
 	dma->nr_pages = nr_pages;
 
 	return 0;
+out_free_pages:
+	while (i > 0) {
+		void *addr = page_address(dma->vaddr_pages[i]);
+		dma_free_coherent(dma->dev, PAGE_SIZE, addr, dma->dma_addr[i]);
+		i--;
+	}
+	kfree(dma->dma_addr);
+	dma->dma_addr = NULL;
+	kfree(dma->vaddr_pages);
+	dma->vaddr_pages = NULL;
+
+	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(videobuf_dma_init_kernel);
 
@@ -322,8 +357,21 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
 		dma->pages = NULL;
 	}
 
-	vfree(dma->vaddr);
-	dma->vaddr = NULL;
+	if (dma->dma_addr) {
+		for (i = 0; i < dma->nr_pages; i++) {
+			void *addr;
+
+			addr = page_address(dma->vaddr_pages[i]);
+			dma_free_coherent(dma->dev, PAGE_SIZE, addr,
+					  dma->dma_addr[i]);
+		}
+		kfree(dma->dma_addr);
+		dma->dma_addr = NULL;
+		kfree(dma->vaddr_pages);
+		dma->vaddr_pages = NULL;
+		vunmap(dma->vaddr);
+		dma->vaddr = NULL;
+	}
 
 	if (dma->bus_addr)
 		dma->bus_addr = 0;
@@ -461,6 +509,11 @@ static int __videobuf_iolock(struct videobuf_queue *q,
 
 	MAGIC_CHECK(mem->magic, MAGIC_SG_MEM);
 
+	if (!mem->dma.dev)
+		mem->dma.dev = q->dev;
+	else
+		WARN_ON(mem->dma.dev != q->dev);
+
 	switch (vb->memory) {
 	case V4L2_MEMORY_MMAP:
 	case V4L2_MEMORY_USERPTR:
diff --git a/include/media/videobuf-dma-sg.h b/include/media/videobuf-dma-sg.h
index d8fb601..870cb21 100644
--- a/include/media/videobuf-dma-sg.h
+++ b/include/media/videobuf-dma-sg.h
@@ -53,6 +53,9 @@ struct videobuf_dmabuf {
 
 	/* for kernel buffers */
 	void                *vaddr;
+	struct page	    **vaddr_pages;
+	dma_addr_t	    *dma_addr;
+	struct device	    *dev;
 
 	/* for overlay buffers (pci-pci dma) */
 	dma_addr_t          bus_addr;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
@ 2012-01-24  8:58                                                       ` Jan Beulich
  2012-01-24 14:17                                                         ` Konrad Rzeszutek Wilk
  2012-01-24 21:32                                                       ` Carsten Schiers
                                                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2012-01-24  8:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel

>>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
>> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > > The issue as I understand is that the DVB drivers allocate their buffers
>> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
>> > > 
>> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, 
>> > > which
>> > > implies that the DVB drivers end up allocating their buffers above the 
> 4GB.
>> > > This means we end up spending some CPU time (in the guest) copying the 
>> > > memory
>> > > from >4GB to 0-4GB region (And vice-versa).
>> > 
>> > This reminds me of something (not sure what XenoLinux you use for
>> > comparison) - how are they allocating that memory? Not vmalloc_32()
>> 
>> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
>> I am going to look at the 2.6.38 from OpenSuSE.
>> 
>> > by chance (I remember having seen numerous uses under - iirc -
>> > drivers/media/)?
>> > 
>> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
>> > what their (driver) callers might expect in a PV guest (including the
>> > contiguity assumption for the latter, recalling that you earlier said
>> > you were able to see the problem after several guest starts), and I
>> > had put into our kernels an adjustment to make vmalloc_32() actually
>> > behave as expected.
>> 
>> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
>> pointer.
> 
> Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> and then performs PCI DMA operations on the allocted vmalloc_32
> area.
> 
> So I cobbled up the attached patch (hadn't actually tested it and sadly
> won't until next week) which removes the call to vmalloc_32 and instead
> sets up DMA allocated set of pages.

What a big patch (which would need re-doing for every vmalloc_32()
caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
our 3.2 version of the affected function below, but clearly that's not
pv-ops ready).

Jan

static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
				 pgprot_t prot, int node, void *caller)
{
	const int order = 0;
	struct page **pages;
	unsigned int nr_pages, array_size, i;
	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
#ifdef CONFIG_XEN
	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);

	BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
	if (dma_mask == (__GFP_DMA | __GFP_DMA32))
		gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
#endif

	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
	array_size = (nr_pages * sizeof(struct page *));

	area->nr_pages = nr_pages;
	/* Please note that the recursion is strictly bounded. */
	if (array_size > PAGE_SIZE) {
		pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
				PAGE_KERNEL, node, caller);
		area->flags |= VM_VPAGES;
	} else {
		pages = kmalloc_node(array_size, nested_gfp, node);
	}
	area->pages = pages;
	area->caller = caller;
	if (!area->pages) {
		remove_vm_area(area->addr);
		kfree(area);
		return NULL;
	}

	for (i = 0; i < area->nr_pages; i++) {
		struct page *page;
		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

		if (node < 0)
			page = alloc_page(tmp_mask);
		else
			page = alloc_pages_node(node, tmp_mask, order);

		if (unlikely(!page)) {
			/* Successfully allocated i pages, free them in __vunmap() */
			area->nr_pages = i;
			goto fail;
		}
		area->pages[i] = page;
#ifdef CONFIG_XEN
		if (dma_mask) {
			if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
				area->nr_pages = i + 1;
				goto fail;
			}
			if (gfp_mask & __GFP_ZERO)
				clear_highpage(page);
		}
#endif
	}

	if (map_vm_area(area, prot, &pages))
		goto fail;
	return area->addr;

fail:
	warn_alloc_failed(gfp_mask, order,
			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
			  (area->nr_pages*PAGE_SIZE), area->size);
	vfree(area->addr);
	return NULL;
}

...

#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
#define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
#elif defined(CONFIG_XEN)
#define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
#else
#define GFP_VMALLOC32 GFP_KERNEL
#endif

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-24  8:58                                                       ` Jan Beulich
@ 2012-01-24 14:17                                                         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-24 14:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel, Sander Eikelenboom

On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich wrote:
> >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >> > > The issue as I understand is that the DVB drivers allocate their buffers
> >> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> >> > > 
> >> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> >> > > which
> >> > > implies that the DVB drivers end up allocating their buffers above the 
> > 4GB.
> >> > > This means we end up spending some CPU time (in the guest) copying the 
> >> > > memory
> >> > > from >4GB to 0-4GB region (And vice-versa).
> >> > 
> >> > This reminds me of something (not sure what XenoLinux you use for
> >> > comparison) - how are they allocating that memory? Not vmalloc_32()
> >> 
> >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> >> I am going to look at the 2.6.38 from OpenSuSE.
> >> 
> >> > by chance (I remember having seen numerous uses under - iirc -
> >> > drivers/media/)?
> >> > 
> >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> >> > what their (driver) callers might expect in a PV guest (including the
> >> > contiguity assumption for the latter, recalling that you earlier said
> >> > you were able to see the problem after several guest starts), and I
> >> > had put into our kernels an adjustment to make vmalloc_32() actually
> >> > behave as expected.
> >> 
> >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> >> pointer.
> > 
> > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
> > and then performs PCI DMA operations on the allocted vmalloc_32
> > area.
> > 
> > So I cobbled up the attached patch (hadn't actually tested it and sadly
> > won't until next week) which removes the call to vmalloc_32 and instead
> > sets up DMA allocated set of pages.
> 
> What a big patch (which would need re-doing for every vmalloc_32()
> caller)! Fixing vmalloc_32() would be much less intrusive (reproducing
> our 3.2 version of the affected function below, but clearly that's not
> pv-ops ready).

I just want to get to the bottom of this before attempting a proper fix.

> 
> Jan
> 
> static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> 				 pgprot_t prot, int node, void *caller)
> {
> 	const int order = 0;
> 	struct page **pages;
> 	unsigned int nr_pages, array_size, i;
> 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> #ifdef CONFIG_XEN
> 	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> 
> 	BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32));
> 	if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 		gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> #endif
> 
> 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> 	array_size = (nr_pages * sizeof(struct page *));
> 
> 	area->nr_pages = nr_pages;
> 	/* Please note that the recursion is strictly bounded. */
> 	if (array_size > PAGE_SIZE) {
> 		pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM,
> 				PAGE_KERNEL, node, caller);
> 		area->flags |= VM_VPAGES;
> 	} else {
> 		pages = kmalloc_node(array_size, nested_gfp, node);
> 	}
> 	area->pages = pages;
> 	area->caller = caller;
> 	if (!area->pages) {
> 		remove_vm_area(area->addr);
> 		kfree(area);
> 		return NULL;
> 	}
> 
> 	for (i = 0; i < area->nr_pages; i++) {
> 		struct page *page;
> 		gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;
> 
> 		if (node < 0)
> 			page = alloc_page(tmp_mask);
> 		else
> 			page = alloc_pages_node(node, tmp_mask, order);
> 
> 		if (unlikely(!page)) {
> 			/* Successfully allocated i pages, free them in __vunmap() */
> 			area->nr_pages = i;
> 			goto fail;
> 		}
> 		area->pages[i] = page;
> #ifdef CONFIG_XEN
> 		if (dma_mask) {
> 			if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
> 				area->nr_pages = i + 1;
> 				goto fail;
> 			}
> 			if (gfp_mask & __GFP_ZERO)
> 				clear_highpage(page);
> 		}
> #endif
> 	}
> 
> 	if (map_vm_area(area, prot, &pages))
> 		goto fail;
> 	return area->addr;
> 
> fail:
> 	warn_alloc_failed(gfp_mask, order,
> 			  "vmalloc: allocation failure, allocated %ld of %ld bytes\n",
> 			  (area->nr_pages*PAGE_SIZE), area->size);
> 	vfree(area->addr);
> 	return NULL;
> }
> 
> ...
> 
> #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
> #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL
> #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA)
> #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL
> #elif defined(CONFIG_XEN)
> #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL
> #else
> #define GFP_VMALLOC32 GFP_KERNEL
> #endif

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
  2012-01-24  8:58                                                       ` Jan Beulich
@ 2012-01-24 21:32                                                       ` Carsten Schiers
  2012-01-25 12:02                                                       ` Carsten Schiers
  2012-01-25 19:06                                                       ` Carsten Schiers
  3 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2012-01-24 21:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel, Jan Beulich

Konrad, 

I implemented the patch into a 3.1.2 but the patched function doesn't seem to be called (I set debug=1 for the module).
I think it's only for video capturing devices.

But I greped around and found a vmalloc_32 in drivers/media/common/saa7146_core.c line 182 function saa7146_vmalloc_build_pgtable
which is included in module saa7146.ko. This would be the DVB chip. Maybe you can rework the patch so that we can just test what
you intended to test.

Consequently, the patch you did so far doesn't change the load.

Carsten.




-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their 
> > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the 
> > > bounce-buffering, which implies that the DVB drivers end up 
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying 
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and 
> now I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc - 
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do 
> > what their (driver) callers might expect in a PV guest (including 
> > the contiguity assumption for the latter, recalling that you earlier 
> > said you were able to see the problem after several guest starts), 
> > and I had put into our kernels an adjustment to make vmalloc_32() 
> > actually behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the 
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn't actually tested it and sadly won't until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
  2012-01-24  8:58                                                       ` Jan Beulich
  2012-01-24 21:32                                                       ` Carsten Schiers
@ 2012-01-25 12:02                                                       ` Carsten Schiers
  2012-01-25 19:06                                                       ` Carsten Schiers
  3 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2012-01-25 12:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2880 bytes --]

I can now confirm that saa7146_vmalloc_build_pgtable and vmalloc_to_sg are called once per

PCI card and will allocate 329 pages. Sorry, but I am not in the position to modify your patch 

to patch the functions in the right way, but happy to test...

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad@darnok.org>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <JBeulich@suse.com>; 
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Mo 23.01.2012 23:42
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:vmalloc
On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their buffers
> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, 
> > > which
> > > implies that the DVB drivers end up allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying the 
> > > memory
> > > from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now
> I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc -
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do
> > what their (driver) callers might expect in a PV guest (including the
> > contiguity assumption for the latter, recalling that you earlier said
> > you were able to see the problem after several guest starts), and I
> > had put into our kernels an adjustment to make vmalloc_32() actually
> > behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32
and then performs PCI DMA operations on the allocted vmalloc_32
area.

So I cobbled up the attached patch (hadn't actually tested it and sadly
won't until next week) which removes the call to vmalloc_32 and instead
sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please
send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 4278 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
                                                                         ` (2 preceding siblings ...)
  2012-01-25 12:02                                                       ` Carsten Schiers
@ 2012-01-25 19:06                                                       ` Carsten Schiers
  2012-01-25 21:02                                                         ` Konrad Rzeszutek Wilk
  2012-02-15 19:28                                                         ` Konrad Rzeszutek Wilk
  3 siblings, 2 replies; 66+ messages in thread
From: Carsten Schiers @ 2012-01-25 19:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel, Jan Beulich

Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.

  - 2.6.32 pvops		12-13%
  - 3.2.1 pvops		10-11%
  - 2.6.34 XenoLinux	7-8%

BR, Carsten.


-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Montag, 23. Januar 2012 23:32
An: Konrad Rzeszutek Wilk
Cc: Sander Eikelenboom; xen-devel; Jan Beulich
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote:
> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > > The issue as I understand is that the DVB drivers allocate their 
> > > buffers from 0->4GB most (all the time?) so they never have to do bounce-buffering.
> > > 
> > > While the pv-ops one ends up quite frequently doing the 
> > > bounce-buffering, which implies that the DVB drivers end up 
> > > allocating their buffers above the 4GB.
> > > This means we end up spending some CPU time (in the guest) copying 
> > > the memory from >4GB to 0-4GB region (And vice-versa).
> > 
> > This reminds me of something (not sure what XenoLinux you use for
> > comparison) - how are they allocating that memory? Not vmalloc_32()
> 
> I was using the 2.6.18, then the one I saw on Google for Gentoo, and 
> now I am going to look at the 2.6.38 from OpenSuSE.
> 
> > by chance (I remember having seen numerous uses under - iirc - 
> > drivers/media/)?
> > 
> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do 
> > what their (driver) callers might expect in a PV guest (including 
> > the contiguity assumption for the latter, recalling that you earlier 
> > said you were able to see the problem after several guest starts), 
> > and I had put into our kernels an adjustment to make vmalloc_32() 
> > actually behave as expected.
> 
> Aaah.. The plot thickens! Let me look in the sources! Thanks for the 
> pointer.

Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 and then performs PCI DMA operations on the allocted vmalloc_32 area.

So I cobbled up the attached patch (hadn't actually tested it and sadly won't until next week) which removes the call to vmalloc_32 and instead sets up DMA allocated set of pages.

If that fixes it for you that is awesome, but if it breaks please send me your logs.

Cheers,
Konrad
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-25 19:06                                                       ` Carsten Schiers
@ 2012-01-25 21:02                                                         ` Konrad Rzeszutek Wilk
  2012-02-15 19:28                                                         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-25 21:02 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:
> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
> reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.
> 
>   - 2.6.32 pvops		12-13%
>   - 3.2.1 pvops		10-11%

Yeah. I think this idue to the fix I added in xen-swiotlb to not always
do the bounce copying.

>   - 2.6.34 XenoLinux	7-8%
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-01-25 19:06                                                       ` Carsten Schiers
  2012-01-25 21:02                                                         ` Konrad Rzeszutek Wilk
@ 2012-02-15 19:28                                                         ` Konrad Rzeszutek Wilk
  2012-02-16  8:56                                                           ` Jan Beulich
  1 sibling, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-02-15 19:28 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

On Wed, Jan 25, 2012 at 08:06:12PM +0100, Carsten Schiers wrote:
> Some news: in order to prepare a clean setting, I upgraded to 3.2.1 kernel. I noticed that the load increase is
> reduced a bit, but noticably. It's only a simple test, running the DomU for 2 minutes, but the idle load is aprox.
> 
>   - 2.6.32 pvops		12-13%
>   - 3.2.1 pvops		10-11%
>   - 2.6.34 XenoLinux	7-8%

I took a stab at Jan's idea - it compiles but I hadn't been able to properly test it.

[-- Attachment #2: vmalloc_using_xen_limit_pages.patch --]
[-- Type: text/plain, Size: 6845 bytes --]

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 87f6673..6bb6f68 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2073,6 +2074,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */
 #define MAX_CONTIG_ORDER 9 /* 2MB */
 static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))
 static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
@@ -2097,6 +2099,36 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+
+		if (in_frames)
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+		__set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to
@@ -2140,6 +2172,49 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn,
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1u << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to
@@ -2266,6 +2341,90 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < 1ULL << order; i++) {
+		void *vaddr;
+		page = &pages[i];
+		vaddr = page_address(page);
+		if (address_bits) {
+			if (!pfn_to_mfn(virt_to_mfn(vaddr)) >> (address_bits - PAGE_SHIFT))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0, in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)
 {
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 03c85d7..ae5b1ef 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -28,4 +28,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 27be2f0..4fa2066 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= (__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1586,6 +1592,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-15 19:28                                                         ` Konrad Rzeszutek Wilk
@ 2012-02-16  8:56                                                           ` Jan Beulich
  2012-02-17 15:07                                                             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Jan Beulich @ 2012-02-16  8:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Sander Eikelenboom

>>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> 	struct page **pages;
> 	unsigned int nr_pages, array_size, i;
> 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>-
>+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
>+	if (xen_pv_domain()) {
>+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))

I didn't spot where you force this normally invalid combination, without
which the change won't affect vmalloc32() in a 32-bit kernel.

>+			gfp_mask &= (__GFP_DMA | __GFP_DMA32);

			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);

Jan

>+	}
> 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> 	array_size = (nr_pages * sizeof(struct page *));
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-16  8:56                                                           ` Jan Beulich
@ 2012-02-17 15:07                                                             ` Konrad Rzeszutek Wilk
  2012-02-28 14:35                                                               ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-02-17 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Sander Eikelenboom

On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > 	struct page **pages;
> > 	unsigned int nr_pages, array_size, i;
> > 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+	if (xen_pv_domain()) {
> >+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
> 
> >+			gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> 			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan

Duh!
Good eyes. Thanks for catching that.

> 
> >+	}
> > 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > 	array_size = (nr_pages * sizeof(struct page *));
> > 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-17 15:07                                                             ` Konrad Rzeszutek Wilk
@ 2012-02-28 14:35                                                               ` Carsten Schiers
  2012-02-29 12:10                                                                 ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2012-02-28 14:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 1775 bytes --]

Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan

Duh!
Good eyes. Thanks for catching that.

> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 3077 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-28 14:35                                                               ` Carsten Schiers
@ 2012-02-29 12:10                                                                 ` Carsten Schiers
  2012-02-29 12:56                                                                   ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2012-02-29 12:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom


[-- Attachment #1.1: Type: text/plain, Size: 2578 bytes --]

Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the 

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"? 

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan

Duh!
Good eyes. Thanks for catching that.

> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 

[-- Attachment #1.2: Type: text/html, Size: 29524 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-29 12:10                                                                 ` Carsten Schiers
@ 2012-02-29 12:56                                                                   ` Carsten Schiers
  2012-05-11  9:39                                                                     ` Carsten Schiers
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2012-02-29 12:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 6280 bytes --]

I am very sorry. I accidently started the DomU with the wrong config file, thus it's clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is having a BUG:

 


[   14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000
[   14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31
[   14.674930] PGD 0 
[   14.674940] Oops: 0002 [#1] SMP 
[   14.674952] CPU 0 
[   14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
[   14.675057] 
[   14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1  
[   14.675079] RIP: e030:[<ffffffff811b4c0b>]  [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31
[   14.675097] RSP: e02b:ffff880013fabe58  EFLAGS: 00010202
[   14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000
[   14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000
[   14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000
[   14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090
[   14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8
[   14.675163] FS:  00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000
[   14.675175] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660
[   14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020)
[   14.675227] Stack:
[   14.675232]  ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000
[   14.675251]  00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd
[   14.675270]  ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0
[   14.675289] Call Trace:
[   14.675295]  <IRQ> 
[   14.675307]  [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47
[   14.675322]  [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core]
[   14.675337]  [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184
[   14.675350]  [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8
[   14.675364]  [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5
[   14.675376]  [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77
[   14.675388]  [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0
[   14.675400]  [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205
[   14.675412]  [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30
[   14.675425]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
[   14.675436]  [<ffffffff8104c996>] ? irq_exit+0x44/0xb5
[   14.675452]  [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32
[   14.675464]  [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30
[   14.675473]  <EOI> 

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan

Duh!
Good eyes. Thanks for catching that.

> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 
 

 

[-- Attachment #1.2: Type: text/html, Size: 34028 bytes --]

[-- Attachment #2: debug.log --]
[-- Type: application/octet-stream, Size: 21410 bytes --]

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-02-29 12:56                                                                   ` Carsten Schiers
@ 2012-05-11  9:39                                                                     ` Carsten Schiers
  2012-05-11 19:41                                                                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2012-05-11  9:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom


[-- Attachment #1.1: Type: text/plain, Size: 7205 bytes --]

Hi Konrad,

 
don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. 

But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG, 

so we still have not checked whether our theory is correct.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 14:01
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:debug.log, inline.txt
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; 
 

I am very sorry. I accidently started the DomU with the wrong config file, thus it's clear why there is no difference

between the two. And unfortunately, the DomU with the correct config file is having a BUG:

 


 [   14.674883] BUG: unable to handle kernel paging request at ffffc7fffffff000 [   14.674910] IP: [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [   14.674930] PGD 0  [   14.674940] Oops: 0002 [#1] SMP  [   14.674952] CPU 0  [   14.674957] Modules linked in: nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc tda10023 budget_av evdev saa7146_vv videodev v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core budget_core snd_pcm dvb_core snd_timer saa7146 snd ttpci_eeprom soundcore snd_page_alloc i2c_core pcspkr ext3 jbd mbcache xen_netfront xen_blkfront [   14.675057]  [   14.675065] Pid: 0, comm: swapper/0 Not tainted 3.2.8-amd64 #1   [   14.675079] RIP: e030:[<ffffffff811b4c0b>]  [<ffffffff811b4c0b>] swiotlb_bounce+0x2e/0x31 [   14.675097] RSP: e02b:ffff880013fabe58  EFLAGS: 00010202 [   14.675106] RAX: ffff880012800000 RBX: 0000000000000001 RCX: 0000000000001000 [   14.675116] RDX: 0000000000001000 RSI: ffff880012800000 RDI: ffffc7fffffff000 [   14.675126] RBP: 0000000000000002 R08: ffffc7fffffff000 R09: ffff880013f98000 [   14.675137] R10: 0000000000000001 R11: ffff880003376000 R12: ffff8800032c5090 [   14.675147] R13: 0000000000000149 R14: ffff8800033e0000 R15: ffffffff81601fd8 [   14.675163] FS:  00007f3ff9893700(0000) GS:ffff880013fa8000(0000) knlGS:0000000000000000 [   14.675175] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b [   14.675184] CR2: ffffc7fffffff000 CR3: 0000000012683000 CR4: 0000000000000660 [   14.675195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   14.675205] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [   14.675216] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020) [   14.675227] Stack: [   14.675232]  ffffffff81211826 ffff880002eda000 0000000000000000 ffffc90000408000 [   14.675251]  00000000000b0150 0000000000000006 ffffffffa013ec4a ffffffff810946cd [   14.675270]  ffffffff81099203 ffff880003376000 0000000000000000 ffff880002eda4b0 [   14.675289] Call Trace: [   14.675295]  <IRQ>  [   14.675307]  [<ffffffff81211826>] ? xen_swiotlb_sync_sg_for_cpu+0x2e/0x47 [   14.675322]  [<ffffffffa013ec4a>] ? vpeirq+0x7f/0x198 [budget_core] [   14.675337]  [<ffffffff810946cd>] ? handle_irq_event_percpu+0x166/0x184 [   14.675350]  [<ffffffff81099203>] ? __rcu_process_callbacks+0x71/0x2f8 [   14.675364]  [<ffffffff8104d175>] ? tasklet_action+0x76/0xc5 [   14.675376]  [<ffffffff8120a9ac>] ? eoi_pirq+0x5b/0x77 [   14.675388]  [<ffffffff8104cbc6>] ? __do_softirq+0xc4/0x1a0 [   14.675400]  [<ffffffff8120a022>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [   14.675412]  [<ffffffff8134b06c>] ? call_softirq+0x1c/0x30 [   14.675425]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [   14.675436]  [<ffffffff8104c996>] ? irq_exit+0x44/0xb5 [   14.675452]  [<ffffffff8120b032>] ? xen_evtchn_do_upcall+0x27/0x32 [   14.675464]  [<ffffffff8134b0be>] ? xen_do_hypervisor_callback+0x1e/0x30 [   14.675473]  <EOI> 

 
Complete log is attached.

 
BR, Carsten.
 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Mi 29.02.2012 13:16
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Great news: it works and load is back to normal. In the attached graph you can see the peak

in blue (compilation of the patched 3.2.8 Kernel) and then after 16.00 the going life of the

video DomU. We are below an avaerage of 7% usage (figures are in Permille).


Thanks so much. Is that already "the final patch"?

 
BR, Carsten.

 

 
-----Ursprüngliche Nachricht-----
An:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; 
CC:Sander Eikelenboom <linux@eikelenboom.it>; xen-devel <xen-devel@lists.xensource.com>; Jan Beulich <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad@darnok.org>; 
Von:Carsten Schiers <carsten@schiers.de>
Gesendet:Di 28.02.2012 15:39
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
Anlage:inline.txt
 

Well let me check for a longer period of time, and especially, whether the DomU is still

working (can do that only from at home), but load looks pretty well after applying the

patch to 3.2.8 :-D.

 
BR,

Carsten.
 
-----Ursprüngliche Nachricht-----
An:Jan Beulich <JBeulich@suse.com>; 
CC:Konrad Rzeszutek Wilk <konrad@darnok.org>; xen-devel <xen-devel@lists.xensource.com>; Carsten Schiers <carsten@schiers.de>; Sander Eikelenboom <linux@eikelenboom.it>; 
Von:Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Gesendet:Fr 17.02.2012 16:18
Betreff:Re: [Xen-devel] Load increase after memory upgrade (part2)
On Thu, Feb 16, 2012 at 08:56:53AM +0000, Jan Beulich wrote:
> >>> On 15.02.12 at 20:28, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >@@ -1550,7 +1552,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > struct page **pages;
> > unsigned int nr_pages, array_size, i;
> > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >-
> >+gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> >+if (xen_pv_domain()) {
> >+if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> I didn't spot where you force this normally invalid combination, without
> which the change won't affect vmalloc32() in a 32-bit kernel.
> 
> >+gfp_mask &= (__GFP_DMA | __GFP_DMA32);
> 
> gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> 
> Jan

Duh!
Good eyes. Thanks for catching that.

> 
> >+}
> > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
 
 

 
 --------------------------------
E-Mail ist virenfrei.
 Von AVG überprüft - www.avg.de
 Version: 2012.0.2127 / Virendatenbank: 2411/4932 - Ausgabedatum: 12.04.2012
 

[-- Attachment #1.2: Type: text/html, Size: 35874 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-05-11  9:39                                                                     ` Carsten Schiers
@ 2012-05-11 19:41                                                                       ` Konrad Rzeszutek Wilk
  2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-05-11 19:41 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> Hi Konrad,
> 
>  
> don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. 
> 
> But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG, 

Yes, that is right. Thank you for reminding me.
> 
> so we still have not checked whether our theory is correct.

No we haven't. And I should be have no trouble reproducing this. I can just write
a tiny module that allocates vmalloc_32().

But your timming sucks - I am going on a week vacation next week :-(

Ah, if there was just a cloning machine - I could stick myself in it,
and Baseline_0 goes on vacation, while Clone_1 goes on working. Then
git merge Baseline_0 and Clone_1 in a week and fixup the merge conflicts
and continue on. Sigh.

Can I ask you to be patient with me once more and ping me in a week - when
I am back from vacation and my brain is fresh to work on this?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-05-11 19:41                                                                       ` Konrad Rzeszutek Wilk
@ 2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
  2012-06-14  7:07                                                                           ` Jan Beulich
                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-13 16:55 UTC (permalink / raw)
  To: Carsten Schiers
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. 
> > 
> > But I think this mistery is still open. My last status was that the latest patch you produced resulted in a BUG, 
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven't. And I should be have no trouble reproducing this. I can just write
> a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please
try it out? It has the #define DEBUG 1 set so it should print a lot of
stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
>From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */
 #define MAX_CONTIG_ORDER 9 /* 2MB */
 static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))
 static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
@@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page));
+#endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to
@@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page));
+#endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to
@@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
 {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n",  __func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", __func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr)));
+#endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)
 {
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
@ 2012-06-14  7:07                                                                           ` Jan Beulich
  2012-06-14 18:33                                                                             ` Konrad Rzeszutek Wilk
  2012-06-14 18:43                                                                             ` Carsten Schiers
  2012-06-14  8:38                                                                           ` David Vrabel
  2012-06-14 18:40                                                                           ` Carsten Schiers
  2 siblings, 2 replies; 66+ messages in thread
From: Jan Beulich @ 2012-06-14  7:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Sander Eikelenboom

>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> +	if (xen_pv_domain()) {
> +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))

As said in an earlier reply - without having any place that would
ever set both flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we
set GFP_VMALLOC32 to such a value for 32-bit kernels (which
otherwise would merely use GFP_KERNEL, and hence not trigger
the code calling xen_limit_pages_to_max_mfn()). I don't recall
though whether Carsten's problem was on a 32- or 64-bit kernel.

Jan

> +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> +	}
>  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
>  

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
  2012-06-14  7:07                                                                           ` Jan Beulich
@ 2012-06-14  8:38                                                                           ` David Vrabel
  2012-06-14 18:31                                                                             ` Konrad Rzeszutek Wilk
  2012-06-14 18:40                                                                           ` Carsten Schiers
  2 siblings, 1 reply; 66+ messages in thread
From: David Vrabel @ 2012-06-14  8:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Jan Beulich,
	Sander Eikelenboom

On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:
> 
> +	/* 3. Do the exchange for non-contiguous MFNs. */
> +	success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
> +				      n, 0, out_frames, address_bits);

vmalloc() does not require physically contiguous MFNs.

David

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-14  8:38                                                                           ` David Vrabel
@ 2012-06-14 18:31                                                                             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 18:31 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Jan Beulich,
	Sander Eikelenboom

On Thu, Jun 14, 2012 at 09:38:31AM +0100, David Vrabel wrote:
> On 13/06/12 17:55, Konrad Rzeszutek Wilk wrote:
> > 
> > +	/* 3. Do the exchange for non-contiguous MFNs. */
> > +	success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
> > +				      n, 0, out_frames, address_bits);
> 
> vmalloc() does not require physically contiguous MFNs.

<nods> It doesn't matter that much in this context as the vmalloc
calls this per-page - so it is only one page that is swizzled.

> 
> David

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-14  7:07                                                                           ` Jan Beulich
@ 2012-06-14 18:33                                                                             ` Konrad Rzeszutek Wilk
  2012-06-14 18:43                                                                             ` Carsten Schiers
  1 sibling, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-06-14 18:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Konrad Rzeszutek Wilk, xen-devel, Carsten Schiers, Sander Eikelenboom

On Thu, Jun 14, 2012 at 08:07:55AM +0100, Jan Beulich wrote:
> >>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  	struct page **pages;
> >  	unsigned int nr_pages, array_size, i;
> >  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > -
> > +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> > +	if (xen_pv_domain()) {
> > +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
> 
> As said in an earlier reply - without having any place that would
> ever set both flags at once, this whole conditional is meaningless.
> In our code - which I suppose is where you cloned this from - we

Yup.
> set GFP_VMALLOC32 to such a value for 32-bit kernels (which
> otherwise would merely use GFP_KERNEL, and hence not trigger

Ah, let me double check. Thanks for looking out for this.

> the code calling xen_limit_pages_to_max_mfn()). I don't recall
> though whether Carsten's problem was on a 32- or 64-bit kernel.
> 
> Jan
> 
> > +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> > +	}
> >  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
> >  	array_size = (nr_pages * sizeof(struct page *));
> >  
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
  2012-06-14  7:07                                                                           ` Jan Beulich
  2012-06-14  8:38                                                                           ` David Vrabel
@ 2012-06-14 18:40                                                                           ` Carsten Schiers
  2012-06-14 19:16                                                                             ` Carsten Schiers
  2 siblings, 1 reply; 66+ messages in thread
From: Carsten Schiers @ 2012-06-14 18:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

Konrad, against which kernel version did you produce this patch? It will not succeed
with 3.4.2 at least, will look up some older version now...

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. 
> > 
> > But I think this mistery is still open. My last status was that the 
> > latest patch you produced resulted in a BUG,
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven't. And I should be have no trouble reproducing this. I can 
> just write a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
>From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */  #define MAX_CONTIG_ORDER 9 /* 2MB */  static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))  static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,  {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n",  __func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)  }  EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", 
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, 
+limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)  { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-14  7:07                                                                           ` Jan Beulich
  2012-06-14 18:33                                                                             ` Konrad Rzeszutek Wilk
@ 2012-06-14 18:43                                                                             ` Carsten Schiers
  1 sibling, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2012-06-14 18:43 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Sander Eikelenboom

It's a 64 Bit kernel...

-----Ursprüngliche Nachricht-----
Von: Jan Beulich [mailto:JBeulich@suse.com] 
Gesendet: Donnerstag, 14. Juni 2012 09:08
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; Sander Eikelenboom; xen-devel; Carsten Schiers
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

>>> On 13.06.12 at 18:55, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	struct page **pages;
>  	unsigned int nr_pages, array_size, i;
>  	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> -
> +	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
> +	if (xen_pv_domain()) {
> +		if (dma_mask == (__GFP_DMA | __GFP_DMA32))

As said in an earlier reply - without having any place that would ever set both flags at once, this whole conditional is meaningless.
In our code - which I suppose is where you cloned this from - we set GFP_VMALLOC32 to such a value for 32-bit kernels (which otherwise would merely use GFP_KERNEL, and hence not trigger the code calling xen_limit_pages_to_max_mfn()). I don't recall though whether Carsten's problem was on a 32- or 64-bit kernel.

Jan

> +			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
> +	}
>  	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
>  



-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Load increase after memory upgrade (part2)
  2012-06-14 18:40                                                                           ` Carsten Schiers
@ 2012-06-14 19:16                                                                             ` Carsten Schiers
  0 siblings, 0 replies; 66+ messages in thread
From: Carsten Schiers @ 2012-06-14 19:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Sander Eikelenboom

OK, found the problem in the patch file, baking 3.4.2...BR, Carsten.

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Carsten Schiers
Gesendet: Donnerstag, 14. Juni 2012 20:40
An: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

Konrad, against which kernel version did you produce this patch? It will not succeed with 3.4.2 at least, will look up some older version now...

-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] Im Auftrag von Konrad Rzeszutek Wilk
Gesendet: Mittwoch, 13. Juni 2012 18:55
An: Carsten Schiers
Cc: Konrad Rzeszutek Wilk; xen-devel; Jan Beulich; Sander Eikelenboom
Betreff: Re: [Xen-devel] Load increase after memory upgrade (part2)

On Fri, May 11, 2012 at 03:41:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, May 11, 2012 at 11:39:08AM +0200, Carsten Schiers wrote:
> > Hi Konrad,
> > 
> >  
> > don't want to be pushy, as I have no real issue. I simply use the Xenified kernel or take the double load. 
> > 
> > But I think this mistery is still open. My last status was that the 
> > latest patch you produced resulted in a BUG,
> 
> Yes, that is right. Thank you for reminding me.
> > 
> > so we still have not checked whether our theory is correct.
> 
> No we haven't. And I should be have no trouble reproducing this. I can 
> just write a tiny module that allocates vmalloc_32().

Done. Found some bugs.. and here is anew version. Can you please try it out? It has the #define DEBUG 1 set so it should print a lot of stuff when the DVB module loads. If it crashes please send me the full log.

Thanks.
>From 5afb4ab1fb3d2b059fe1a6db93ab65cb76f43b8a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 31 May 2012 14:21:04 -0400
Subject: [PATCH] xen/vmalloc_32: Use xen_exchange_.. when GFP flags are DMA.
 [v3]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c    |  187 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/xen-ops.h |    2 +
 mm/vmalloc.c          |   18 +++++-
 3 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 3a73785..960d206 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <linux/gfp.h>
 #include <linux/memblock.h>
 #include <linux/seq_file.h>
+#include <linux/slab.h>
 
 #include <trace/events/xen.h>
 
@@ -2051,6 +2052,7 @@ void __init xen_init_mmu_ops(void)
 /* Protected by xen_reservation_lock. */  #define MAX_CONTIG_ORDER 9 /* 2MB */  static unsigned long discontig_frames[1<<MAX_CONTIG_ORDER];
+static unsigned long limited_frames[1<<MAX_CONTIG_ORDER];
 
 #define VOID_PTE (mfn_pte(0, __pgprot(0)))  static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, @@ -2075,6 +2077,42 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order,
 	}
 	xen_mc_issue(0);
 }
+static int xen_zap_page_range(struct page *pages, unsigned int order,
+				unsigned long *in_frames,
+				unsigned long *out_frames,
+				void *limit_bitmap)
+{
+	int i, n = 0;
+	struct multicall_space mcs;
+	struct page *page;
+
+	xen_mc_batch();
+	for (i = 0; i < (1UL<<order); i++) {
+		if (!test_bit(i, limit_bitmap))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+#define DEBUG 1
+		if (in_frames) {
+#ifdef DEBUG
+			printk(KERN_INFO "%s:%d 0x%lx(pfn) 0x%lx (mfn) 0x%lx(vaddr)\n",
+				__func__, i, page_to_pfn(page),
+				pfn_to_mfn(page_to_pfn(page)), page_address(page)); #endif
+			in_frames[i] = pfn_to_mfn(page_to_pfn(page));
+		}
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page), VOID_PTE, 0);
+		set_phys_to_machine(page_to_pfn(page), INVALID_P2M_ENTRY);
+
+		if (out_frames)
+			out_frames[i] = page_to_pfn(page);
+		++n;
+
+	}
+	xen_mc_issue(0);
+	return n;
+}
 
 /*
  * Update the pfn-to-mfn mappings for a virtual address range, either to @@ -2118,6 +2156,53 @@ static void xen_remap_exchanged_ptes(unsigned long vaddr, int order,
 
 	xen_mc_issue(0);
 }
+static void xen_remap_exchanged_pages(struct page *pages, int order,
+				     unsigned long *mfns,
+				     unsigned long first_mfn, /* in_frame if we failed*/
+				     void *limit_map)
+{
+	unsigned i, limit;
+	unsigned long mfn;
+	struct page *page;
+
+	xen_mc_batch();
+
+	limit = 1ULL << order;
+	for (i = 0; i < limit; i++) {
+		struct multicall_space mcs;
+		unsigned flags;
+
+		if (!test_bit(i, limit_map))
+			continue;
+
+		page = &pages[i];
+		mcs = __xen_mc_entry(0);
+		if (mfns)
+			mfn = mfns[i];
+		else
+			mfn = first_mfn + i;
+
+		if (i < (limit - 1))
+			flags = 0;
+		else {
+			if (order == 0)
+				flags = UVMF_INVLPG | UVMF_ALL;
+			else
+				flags = UVMF_TLB_FLUSH | UVMF_ALL;
+		}
+#ifdef DEBUG
+		printk(KERN_INFO "%s (%d) pfn:0x%lx, pfn: 0x%lx vaddr: 0x%lx\n",
+			__func__, i, page_to_pfn(page), mfn, page_address(page)); #endif
+		MULTI_update_va_mapping(mcs.mc, (unsigned long)page_address(page),
+				mfn_pte(mfn, PAGE_KERNEL), flags);
+
+		set_phys_to_machine(page_to_pfn(page), mfn);
+	}
+
+	xen_mc_issue(0);
+}
+
 
 /*
  * Perform the hypercall to exchange a region of our pfns to point to @@ -2136,7 +2221,9 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,  {
 	long rc;
 	int success;
-
+#ifdef DEBUG
+	int i;
+#endif
 	struct xen_memory_exchange exchange = {
 		.in = {
 			.nr_extents   = extents_in,
@@ -2157,7 +2244,11 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
 
 	rc = HYPERVISOR_memory_op(XENMEM_exchange, &exchange);
 	success = (exchange.nr_exchanged == extents_in);
-
+#ifdef DEBUG
+	for (i = 0; i <  exchange.nr_exchanged; i++) {
+		printk(KERN_INFO "%s 0x%lx (mfn) <-> 0x%lx (mfn)\n",  __func__,pfns_in[i], mfns_out[i]);
+	}
+#endif
 	BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
 	BUG_ON(success && (rc != 0));
 
@@ -2231,8 +2322,8 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
 	xen_zap_pfn_range(vstart, order, NULL, out_frames);
 
 	/* 3. Do the exchange for non-contiguous MFNs. */
-	success = xen_exchange_memory(1, order, &in_frame, 1UL << order,
-					0, out_frames, 0);
+	success = xen_exchange_memory(1, order, &in_frame,
+				      1UL << order, 0, out_frames, 0);
 
 	/* 4. Map new pages in place of old pages. */
 	if (success)
@@ -2244,6 +2335,94 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)  }  EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits)
+{
+	unsigned long *in_frames = discontig_frames, *out_frames = limited_frames;
+	unsigned long  flags;
+	struct page *page;
+	int success;
+	int i, n = 0;
+	unsigned long _limit_map;
+	unsigned long *limit_map;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return 0;
+
+	if (unlikely(order > MAX_CONTIG_ORDER))
+		return -ENOMEM;
+
+	if (BITS_PER_LONG >> order) {
+		limit_map = kzalloc(BITS_TO_LONGS(1U << order) *
+				    sizeof(*limit_map), GFP_KERNEL);
+		if (unlikely(!limit_map))
+			return -ENOMEM;
+	} else
+		limit_map = &_limit_map;
+
+	/* 0. Construct our per page bitmap lookup. */
+
+	if (address_bits && (address_bits < PAGE_SHIFT))
+			return -EINVAL;
+
+	if (order)
+		bitmap_zero(limit_map, 1U << order);
+	else
+		__set_bit(0, limit_map);
+
+	/* 1. Clear the pages */
+	for (i = 0; i < (1ULL << order); i++) {
+		void *vaddr;
+		page = &pages[i];
+
+		vaddr = page_address(page);
+#ifdef DEBUG
+		printk(KERN_INFO "%s: page: %p vaddr: %p 0x%lx(mfn) 0x%lx(pfn)\n", 
+__func__, page, vaddr, virt_to_mfn(vaddr), mfn_to_pfn(virt_to_mfn(vaddr))); #endif
+		if (address_bits) {
+			if (!(virt_to_mfn(vaddr) >> (address_bits - PAGE_SHIFT)))
+				continue;
+			__set_bit(i, limit_map);
+		}
+		if (!PageHighMem(page))
+			memset(vaddr, 0, PAGE_SIZE);
+		else {
+			memset(kmap(page), 0, PAGE_SIZE);
+			kunmap(page);
+			++n;
+		}
+	}
+	/* Check to see if we actually have to do any work. */
+	if (bitmap_empty(limit_map, 1U << order)) {
+		if (limit_map != &_limit_map)
+			kfree(limit_map);
+		return 0;
+	}
+	if (n)
+		kmap_flush_unused();
+
+	spin_lock_irqsave(&xen_reservation_lock, flags);
+
+	/* 2. Zap current PTEs. */
+	n = xen_zap_page_range(pages, order, in_frames, NULL /*out_frames */, 
+limit_map);
+
+	/* 3. Do the exchange for non-contiguous MFNs. */
+	success = xen_exchange_memory(n, 0 /* this is always called per page */, in_frames,
+				      n, 0, out_frames, address_bits);
+
+	/* 4. Map new pages in place of old pages. */
+	if (success)
+		xen_remap_exchanged_pages(pages, order, out_frames, 0, limit_map);
+	else
+		xen_remap_exchanged_pages(pages, order, NULL, *in_frames, limit_map);
+
+	spin_unlock_irqrestore(&xen_reservation_lock, flags);
+	if (limit_map != &_limit_map)
+		kfree(limit_map);
+
+	return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_limit_pages_to_max_mfn);
 #ifdef CONFIG_XEN_PVHVM
 static void xen_hvm_exit_mmap(struct mm_struct *mm)  { diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 6a198e4..2f8709f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_limit_pages_to_max_mfn(struct page *pages, unsigned int order,
+			       unsigned int address_bits);
 #endif /* INCLUDE_XEN_OPS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 2aad499..194af07 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+#include <xen/xen.h>
+#include <xen/xen-ops.h>
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -1576,7 +1578,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	struct page **pages;
 	unsigned int nr_pages, array_size, i;
 	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
-
+	gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32);
+	if (xen_pv_domain()) {
+		if (dma_mask == (__GFP_DMA | __GFP_DMA32))
+			gfp_mask &= ~(__GFP_DMA | __GFP_DMA32);
+	}
 	nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
@@ -1612,6 +1618,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			goto fail;
 		}
 		area->pages[i] = page;
+		if (xen_pv_domain()) {
+			if (dma_mask) {
+				if (xen_limit_pages_to_max_mfn(page, 0, 32)) {
+					area->nr_pages = i + 1;
+					goto fail;
+				}
+			if (gfp_mask & __GFP_ZERO)
+				clear_highpage(page);
+			}
+		}
 	}
 
 	if (map_vm_area(area, prot, &pages))
--
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5067 - Ausgabedatum: 13.06.2012 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

-----
E-Mail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 2012.0.2180 / Virendatenbank: 2433/5069 - Ausgabedatum: 14.06.2012 

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2012-06-14 19:16 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-24 12:28 Load increase after memory upgrade (part2) Carsten Schiers
2011-11-25 18:42 ` Konrad Rzeszutek Wilk
2011-11-25 22:11   ` Carsten Schiers
2011-11-28 15:28     ` Konrad Rzeszutek Wilk
2011-11-28 15:40       ` Ian Campbell
2011-11-28 16:45         ` Konrad Rzeszutek Wilk
2011-11-29  8:31           ` Jan Beulich
2011-11-29  9:31             ` Carsten Schiers
2011-11-29  9:46           ` Carsten Schiers
2011-11-29 10:23           ` Ian Campbell
2011-11-29 15:33             ` Konrad Rzeszutek Wilk
2011-12-02 15:23               ` Konrad Rzeszutek Wilk
2011-12-04 11:59                 ` Carsten Schiers
2011-12-04 12:09                 ` Carsten Schiers
2011-12-06  3:26                   ` Konrad Rzeszutek Wilk
2011-12-14 20:23                     ` Konrad Rzeszutek Wilk
2011-12-14 22:07                       ` Konrad Rzeszutek Wilk
2011-12-15 14:52                         ` Carsten Schiers
2011-12-16 14:56                         ` Carsten Schiers
2011-12-16 15:04                           ` Konrad Rzeszutek Wilk
2011-12-16 15:51                             ` Carsten Schiers
2011-12-16 16:19                               ` Konrad Rzeszutek Wilk
2011-12-17 22:12                                 ` Carsten Schiers
2011-12-18  0:19                                   ` Sander Eikelenboom
2011-12-19 14:56                                     ` Konrad Rzeszutek Wilk
2012-01-10 21:55                                       ` Konrad Rzeszutek Wilk
2012-01-12 22:06                                         ` Sander Eikelenboom
2012-01-13  8:12                                           ` Jan Beulich
2012-01-13 15:13                                           ` Konrad Rzeszutek Wilk
2012-01-15 11:32                                             ` Sander Eikelenboom
2012-01-17 21:02                                               ` Konrad Rzeszutek Wilk
2012-01-18 11:28                                                 ` Pasi Kärkkäinen
2012-01-18 11:39                                                   ` Jan Beulich
2012-01-18 11:35                                                 ` Jan Beulich
2012-01-18 14:29                                                   ` Konrad Rzeszutek Wilk
2012-01-23 22:32                                                     ` Konrad Rzeszutek Wilk
2012-01-24  8:58                                                       ` Jan Beulich
2012-01-24 14:17                                                         ` Konrad Rzeszutek Wilk
2012-01-24 21:32                                                       ` Carsten Schiers
2012-01-25 12:02                                                       ` Carsten Schiers
2012-01-25 19:06                                                       ` Carsten Schiers
2012-01-25 21:02                                                         ` Konrad Rzeszutek Wilk
2012-02-15 19:28                                                         ` Konrad Rzeszutek Wilk
2012-02-16  8:56                                                           ` Jan Beulich
2012-02-17 15:07                                                             ` Konrad Rzeszutek Wilk
2012-02-28 14:35                                                               ` Carsten Schiers
2012-02-29 12:10                                                                 ` Carsten Schiers
2012-02-29 12:56                                                                   ` Carsten Schiers
2012-05-11  9:39                                                                     ` Carsten Schiers
2012-05-11 19:41                                                                       ` Konrad Rzeszutek Wilk
2012-06-13 16:55                                                                         ` Konrad Rzeszutek Wilk
2012-06-14  7:07                                                                           ` Jan Beulich
2012-06-14 18:33                                                                             ` Konrad Rzeszutek Wilk
2012-06-14 18:43                                                                             ` Carsten Schiers
2012-06-14  8:38                                                                           ` David Vrabel
2012-06-14 18:31                                                                             ` Konrad Rzeszutek Wilk
2012-06-14 18:40                                                                           ` Carsten Schiers
2012-06-14 19:16                                                                             ` Carsten Schiers
2011-12-19 14:54                                   ` Konrad Rzeszutek Wilk
2011-12-04 12:18                 ` Carsten Schiers
2011-11-28 16:58         ` Laszlo Ersek
2011-11-29  9:37         ` Carsten Schiers
2011-11-28 15:52       ` Carsten Schiers
2011-11-26  9:14   ` Carsten Schiers
2011-11-28 15:30     ` Konrad Rzeszutek Wilk
2011-11-29  9:42       ` Carsten Schiers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.