All of lore.kernel.org
 help / color / mirror / Atom feed
* SWIOTLB allocates unneeded 64 MB buffer in guests
@ 2016-08-22 23:45 Benjamin Serebrin
  2016-08-24  2:44 ` Yang Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Serebrin @ 2016-08-22 23:45 UTC (permalink / raw)
  To: kvm

Hi, kvm land,

The Linux SWIOTLB runs by default in our guest images (and likely all
Linux guests in any cloud), even though SWIOTLB will never actually be
used.  By default, the SWIOTLB allocates a 64 MB bounce buffer in
contiguous low memory that is never used, so that RAM is wasted.  I'd
like to gather opinions on how to tell guests not to bother wasting
that RAM.


Possible solutions that we've discussed internally:
 - We could have an explicit detection that the guest can use to
decide to not allocate the SWIOTLB.
    - The easiest is a hypervisor leaf CPUID bit.
    - However, ACPI is arguably a more appropriate place for this kind
of platform information.  ACPI is a bit more involved to implement in
guest and BIOS; is it worth the trouble?
 - Let the guest infer that no SWIOTLB is needed, perhaps by detecting
the absence of ACPI hotplug slots, and (at late boot) all known PCIe
devices have DMA masks capable of addressing all of RAM.

The timing is nice; we'll see if there's interest in holding a short
BoF at this week's KVM forum.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-22 23:45 SWIOTLB allocates unneeded 64 MB buffer in guests Benjamin Serebrin
@ 2016-08-24  2:44 ` Yang Zhang
  2016-08-24 14:36   ` Benjamin Serebrin
  0 siblings, 1 reply; 12+ messages in thread
From: Yang Zhang @ 2016-08-24  2:44 UTC (permalink / raw)
  To: Benjamin Serebrin, kvm

On 2016/8/23 7:45, Benjamin Serebrin wrote:
> Hi, kvm land,
>
> The Linux SWIOTLB runs by default in our guest images (and likely all
> Linux guests in any cloud), even though SWIOTLB will never actually be
> used.  By default, the SWIOTLB allocates a 64 MB bounce buffer in
> contiguous low memory that is never used, so that RAM is wasted.  I'd
> like to gather opinions on how to tell guests not to bother wasting
> that RAM.

Does the kernel parameter swiotlb cannot solve your problem? I usually 
set the swiotlb = 1 page to reduce memory wasting for swiotlb.

>
>
> Possible solutions that we've discussed internally:
>  - We could have an explicit detection that the guest can use to
> decide to not allocate the SWIOTLB.
>     - The easiest is a hypervisor leaf CPUID bit.
>     - However, ACPI is arguably a more appropriate place for this kind
> of platform information.  ACPI is a bit more involved to implement in
> guest and BIOS; is it worth the trouble?
>  - Let the guest infer that no SWIOTLB is needed, perhaps by detecting
> the absence of ACPI hotplug slots, and (at late boot) all known PCIe
> devices have DMA masks capable of addressing all of RAM.
>
> The timing is nice; we'll see if there's interest in holding a short
> BoF at this week's KVM forum.



-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-24  2:44 ` Yang Zhang
@ 2016-08-24 14:36   ` Benjamin Serebrin
  2016-08-26  1:16     ` Yang Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Serebrin @ 2016-08-24 14:36 UTC (permalink / raw)
  To: Yang Zhang; +Cc: kvm

iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.

However, maintaining guests' kernel commandlines is something we'd
like to stay away from if possible.  It's certainly a short-term
answer, or something individual customers can choose to do today.

Thanks!
Ben

On Tue, Aug 23, 2016 at 7:44 PM, Yang Zhang <yang.zhang.wz@gmail.com> wrote:
> On 2016/8/23 7:45, Benjamin Serebrin wrote:
>>
>> Hi, kvm land,
>>
>> The Linux SWIOTLB runs by default in our guest images (and likely all
>> Linux guests in any cloud), even though SWIOTLB will never actually be
>> used.  By default, the SWIOTLB allocates a 64 MB bounce buffer in
>> contiguous low memory that is never used, so that RAM is wasted.  I'd
>> like to gather opinions on how to tell guests not to bother wasting
>> that RAM.
>
>
> Does the kernel parameter swiotlb cannot solve your problem? I usually set
> the swiotlb = 1 page to reduce memory wasting for swiotlb.
>
>
>>
>>
>> Possible solutions that we've discussed internally:
>>  - We could have an explicit detection that the guest can use to
>> decide to not allocate the SWIOTLB.
>>     - The easiest is a hypervisor leaf CPUID bit.
>>     - However, ACPI is arguably a more appropriate place for this kind
>> of platform information.  ACPI is a bit more involved to implement in
>> guest and BIOS; is it worth the trouble?
>>  - Let the guest infer that no SWIOTLB is needed, perhaps by detecting
>> the absence of ACPI hotplug slots, and (at late boot) all known PCIe
>> devices have DMA masks capable of addressing all of RAM.
>>
>> The timing is nice; we'll see if there's interest in holding a short
>> BoF at this week's KVM forum.
>
>
>
>
> --
> Yang
> Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-24 14:36   ` Benjamin Serebrin
@ 2016-08-26  1:16     ` Yang Zhang
  2016-08-26  2:45       ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Yang Zhang @ 2016-08-26  1:16 UTC (permalink / raw)
  To: Benjamin Serebrin; +Cc: kvm

On 2016/8/24 22:36, Benjamin Serebrin wrote:
> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
>
> However, maintaining guests' kernel commandlines is something we'd
> like to stay away from if possible.  It's certainly a short-term

I don't quite understand why stay away from kernel command line. It 
provides more flexibility, allowing you to turn on/off it by yourself.

> answer, or something individual customers can choose to do today.
>
> Thanks!
> Ben
>
> On Tue, Aug 23, 2016 at 7:44 PM, Yang Zhang <yang.zhang.wz@gmail.com> wrote:
>> On 2016/8/23 7:45, Benjamin Serebrin wrote:
>>>
>>> Hi, kvm land,
>>>
>>> The Linux SWIOTLB runs by default in our guest images (and likely all
>>> Linux guests in any cloud), even though SWIOTLB will never actually be
>>> used.  By default, the SWIOTLB allocates a 64 MB bounce buffer in
>>> contiguous low memory that is never used, so that RAM is wasted.  I'd
>>> like to gather opinions on how to tell guests not to bother wasting
>>> that RAM.
>>
>>
>> Does the kernel parameter swiotlb cannot solve your problem? I usually set
>> the swiotlb = 1 page to reduce memory wasting for swiotlb.
>>
>>
>>>
>>>
>>> Possible solutions that we've discussed internally:
>>>  - We could have an explicit detection that the guest can use to
>>> decide to not allocate the SWIOTLB.
>>>     - The easiest is a hypervisor leaf CPUID bit.
>>>     - However, ACPI is arguably a more appropriate place for this kind
>>> of platform information.  ACPI is a bit more involved to implement in
>>> guest and BIOS; is it worth the trouble?
>>>  - Let the guest infer that no SWIOTLB is needed, perhaps by detecting
>>> the absence of ACPI hotplug slots, and (at late boot) all known PCIe
>>> devices have DMA masks capable of addressing all of RAM.
>>>
>>> The timing is nice; we'll see if there's interest in holding a short
>>> BoF at this week's KVM forum.
>>
>>
>>
>>
>> --
>> Yang
>> Alibaba Cloud Computing


-- 
Yang
Alibaba Cloud Computing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-26  1:16     ` Yang Zhang
@ 2016-08-26  2:45       ` Wanpeng Li
  2016-08-29  6:36         ` Benjamin Serebrin
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2016-08-26  2:45 UTC (permalink / raw)
  To: Yang Zhang; +Cc: Benjamin Serebrin, kvm

2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:
> On 2016/8/24 22:36, Benjamin Serebrin wrote:
>>
>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
>>
>> However, maintaining guests' kernel commandlines is something we'd
>> like to stay away from if possible.  It's certainly a short-term
>
>
> I don't quite understand why stay away from kernel command line. It provides
> more flexibility, allowing you to turn on/off it by yourself.

I agree with Benjamin, it will result in customers have to tune their
guest OSes kernel command line or we supply guest images w/ kernel
command line modification.

Regards,
Wanpeng Li

>
>
>> answer, or something individual customers can choose to do today.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-26  2:45       ` Wanpeng Li
@ 2016-08-29  6:36         ` Benjamin Serebrin
  2016-09-12 11:55           ` Igor Mammedov
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Serebrin @ 2016-08-29  6:36 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Yang Zhang, kvm

Thanks, all,

The general view from last week is to pursue an ACPI table that
indicates that the SWIOTLB isn't needed.  I'll work with our local
ACPI experts on table format.

For existing guests, we'll work on language suggesting kernel command
line options (iommu=off) if people are concerned, and will look into
doing the command line setting in our own provided images.

On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:
>> On 2016/8/24 22:36, Benjamin Serebrin wrote:
>>>
>>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
>>>
>>> However, maintaining guests' kernel commandlines is something we'd
>>> like to stay away from if possible.  It's certainly a short-term
>>
>>
>> I don't quite understand why stay away from kernel command line. It provides
>> more flexibility, allowing you to turn on/off it by yourself.
>
> I agree with Benjamin, it will result in customers have to tune their
> guest OSes kernel command line or we supply guest images w/ kernel
> command line modification.
>
> Regards,
> Wanpeng Li
>
>>
>>
>>> answer, or something individual customers can choose to do today.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-08-29  6:36         ` Benjamin Serebrin
@ 2016-09-12 11:55           ` Igor Mammedov
  2016-09-12 17:14             ` Benjamin Serebrin
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Mammedov @ 2016-09-12 11:55 UTC (permalink / raw)
  To: Benjamin Serebrin; +Cc: Wanpeng Li, Yang Zhang, kvm

On Sun, 28 Aug 2016 23:36:20 -0700
Benjamin Serebrin <serebrin@google.com> wrote:

> Thanks, all,
> 
> The general view from last week is to pursue an ACPI table that
> indicates that the SWIOTLB isn't needed.  I'll work with our local
> ACPI experts on table format.
Isn't SWIOTLB linux specific impl. detail?
Suppose guest is started without SWIOTLB and later user hotplugs
a device that not capable to handle high mem, what's then?

Wouldn't it be better to make SWIOTLB created/allocated
on demand in kernel (i.e. presence of devices that require it)
instead of making hardware(hypervisor) to provide some obscure
ACPI table quirk to fix kernel issue?

> 
> For existing guests, we'll work on language suggesting kernel command
> line options (iommu=off) if people are concerned, and will look into
> doing the command line setting in our own provided images.
> 
> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:  
> >> On 2016/8/24 22:36, Benjamin Serebrin wrote:  
> >>>
> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
> >>>
> >>> However, maintaining guests' kernel commandlines is something we'd
> >>> like to stay away from if possible.  It's certainly a short-term  
> >>
> >>
> >> I don't quite understand why stay away from kernel command line. It provides
> >> more flexibility, allowing you to turn on/off it by yourself.  
> >
> > I agree with Benjamin, it will result in customers have to tune their
> > guest OSes kernel command line or we supply guest images w/ kernel
> > command line modification.
> >
> > Regards,
> > Wanpeng Li
> >  
> >>
> >>  
> >>> answer, or something individual customers can choose to do today.  
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-09-12 11:55           ` Igor Mammedov
@ 2016-09-12 17:14             ` Benjamin Serebrin
  2016-09-12 20:14               ` Paolo Bonzini
  2016-09-13  9:47               ` Igor Mammedov
  0 siblings, 2 replies; 12+ messages in thread
From: Benjamin Serebrin @ 2016-09-12 17:14 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Wanpeng Li, Yang Zhang, kvm

Sure, SWIOTLB is linux-specific but general bounce buffering isn't.

The idea is that the ACPI bit promises that the guest will not ever
need [SWIOTLB] bounce buffering.  That means either no hotplugging at
all, or no hotplugging of high-mem-incapable devices.  If our VMM ever
_adds_ a device to its catalog that's capable of hotplug but not
highmem, we'll clear the ACPI bit, for example.  I'm happy to discuss
and iterate over what promises are made by the ACPI bit if you'd like.

The problem with dynamic allocation of the bounce buffer is that the
SWIOTLB code seems to demand contiguous low memory, and allocating
contiguous memory after boot is never guaranteed because of
fragmentation and subsequent pinning.  The original code seems to be
motivated by this: it does an early allocation of a contiguous low mem
and then a late deallocation if it determines that SWIOTLB is not
needed.  I imagine they wanted to cover cases where some high
mem-incapable device needed a contiguous target buffer because it had
no (or insufficient) scatter/gather capability.

One could tie hot plug of a bounce-buffer-requiring virtual device to
causing SWIOTLB allocation, and fail the device initialization if the
required buffer couldn't be allocated.  I don't know of any new
virtual devices that require that, though, as high-mem-incapability is
hopefully only a vestige of very old virtual or real devices.  And the
plumbing complexity for doing this is much higher than seems
justified.

Thanks!
Ben

On Mon, Sep 12, 2016 at 4:55 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> On Sun, 28 Aug 2016 23:36:20 -0700
> Benjamin Serebrin <serebrin@google.com> wrote:
>
>> Thanks, all,
>>
>> The general view from last week is to pursue an ACPI table that
>> indicates that the SWIOTLB isn't needed.  I'll work with our local
>> ACPI experts on table format.
> Isn't SWIOTLB linux specific impl. detail?
> Suppose guest is started without SWIOTLB and later user hotplugs
> a device that not capable to handle high mem, what's then?
>
> Wouldn't it be better to make SWIOTLB created/allocated
> on demand in kernel (i.e. presence of devices that require it)
> instead of making hardware(hypervisor) to provide some obscure
> ACPI table quirk to fix kernel issue?
>
>>
>> For existing guests, we'll work on language suggesting kernel command
>> line options (iommu=off) if people are concerned, and will look into
>> doing the command line setting in our own provided images.
>>
>> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:
>> >> On 2016/8/24 22:36, Benjamin Serebrin wrote:
>> >>>
>> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
>> >>>
>> >>> However, maintaining guests' kernel commandlines is something we'd
>> >>> like to stay away from if possible.  It's certainly a short-term
>> >>
>> >>
>> >> I don't quite understand why stay away from kernel command line. It provides
>> >> more flexibility, allowing you to turn on/off it by yourself.
>> >
>> > I agree with Benjamin, it will result in customers have to tune their
>> > guest OSes kernel command line or we supply guest images w/ kernel
>> > command line modification.
>> >
>> > Regards,
>> > Wanpeng Li
>> >
>> >>
>> >>
>> >>> answer, or something individual customers can choose to do today.
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-09-12 17:14             ` Benjamin Serebrin
@ 2016-09-12 20:14               ` Paolo Bonzini
  2016-09-12 20:24                 ` Benjamin Serebrin
  2016-09-13  9:47               ` Igor Mammedov
  1 sibling, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2016-09-12 20:14 UTC (permalink / raw)
  To: Benjamin Serebrin, Igor Mammedov; +Cc: Wanpeng Li, Yang Zhang, kvm



On 12/09/2016 19:14, Benjamin Serebrin wrote:
> 
> One could tie hot plug of a bounce-buffer-requiring virtual device to
> causing SWIOTLB allocation, and fail the device initialization if the
> required buffer couldn't be allocated.  I don't know of any new
> virtual devices that require that, though, as high-mem-incapability is
> hopefully only a vestige of very old virtual or real devices.

There are some devices (most importantly virtio 0.9) that use 32-bit
PFNs. They would require bounce buffers if the physical addresses are at
or above 2^44.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-09-12 20:14               ` Paolo Bonzini
@ 2016-09-12 20:24                 ` Benjamin Serebrin
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Serebrin @ 2016-09-12 20:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Igor Mammedov, Wanpeng Li, Yang Zhang, kvm

Yes, that's a bug in virito 0.9, by the way: it should report a 44-bit
DMA mask rather than a 64-bit DMA mask.  That's on my list of things
to upstream.

I propose that VMs with more than 2^44=16TB of RAM do not set the
"SWIOTLB unneeded bit" -- and that those VMs can tolerate wasting 64MB
of RAM.  This seems like a not very onerous restriction.

Thanks,
Ben

On Mon, Sep 12, 2016 at 1:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 12/09/2016 19:14, Benjamin Serebrin wrote:
>>
>> One could tie hot plug of a bounce-buffer-requiring virtual device to
>> causing SWIOTLB allocation, and fail the device initialization if the
>> required buffer couldn't be allocated.  I don't know of any new
>> virtual devices that require that, though, as high-mem-incapability is
>> hopefully only a vestige of very old virtual or real devices.
>
> There are some devices (most importantly virtio 0.9) that use 32-bit
> PFNs. They would require bounce buffers if the physical addresses are at
> or above 2^44.
>
> Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-09-12 17:14             ` Benjamin Serebrin
  2016-09-12 20:14               ` Paolo Bonzini
@ 2016-09-13  9:47               ` Igor Mammedov
  2016-09-19 16:32                 ` Benjamin Serebrin
  1 sibling, 1 reply; 12+ messages in thread
From: Igor Mammedov @ 2016-09-13  9:47 UTC (permalink / raw)
  To: Benjamin Serebrin; +Cc: Wanpeng Li, Yang Zhang, kvm

On Mon, 12 Sep 2016 10:14:55 -0700
Benjamin Serebrin <serebrin@google.com> wrote:

> Sure, SWIOTLB is linux-specific but general bounce buffering isn't.
> 
> The idea is that the ACPI bit promises that the guest will not ever
> need [SWIOTLB] bounce buffering.  That means either no hotplugging at
> all, or no hotplugging of high-mem-incapable devices.  If our VMM ever
> _adds_ a device to its catalog that's capable of hotplug but not
> highmem, we'll clear the ACPI bit, for example.  I'm happy to discuss
> and iterate over what promises are made by the ACPI bit if you'd like.
Implications of above is that you effectively push kernel's iommu=off
option up the stack where it would have to be configured to disable
hotplug (which is for example enabled by default in QEMU).
Also every existing/future device has to be modified to provide
highmem-cap property so that emulator/firmware could decide if
above ACPI table is necessary. It doable if an emulator generates
ACPI tables but close to impossible (via standard interfaces)
if it's firmware's job.

If hotplug is allowed by default and SWIOTLB ACPI table is generated
at boot if there aren't any low mem devices at boot,
then one'd need fix kernel to try dynamically allocate SWIOTLB and
fail high-mem-incapable device hotplug if it's unable to do so.

Trying to save 64Mb out of more than 4Gb memory at above cost seems
a little bit excessive.

Another question:
why don't run emulator with emulated IOMMU enabled? Then linux uses
real IOMMU dma_ops (intel/amd) and 64Mb for SWIOTLB are not wasted/freed
while keeping 32-bit devices operational?
Last time I tested it, it works just fine either for coldplug and
hotplug cases without need to mess with emulators nor any hardware
to provide SWIOTLB ACPI table.


> The problem with dynamic allocation of the bounce buffer is that the
> SWIOTLB code seems to demand contiguous low memory, and allocating
> contiguous memory after boot is never guaranteed because of
> fragmentation and subsequent pinning.  The original code seems to be
> motivated by this: it does an early allocation of a contiguous low mem
> and then a late deallocation if it determines that SWIOTLB is not
> needed.  I imagine they wanted to cover cases where some high
> mem-incapable device needed a contiguous target buffer because it had
> no (or insufficient) scatter/gather capability.
> 
> One could tie hot plug of a bounce-buffer-requiring virtual device to
> causing SWIOTLB allocation, and fail the device initialization if the
> required buffer couldn't be allocated.  I don't know of any new
> virtual devices that require that, though, as high-mem-incapability is
> hopefully only a vestige of very old virtual or real devices.  And the
> plumbing complexity for doing this is much higher than seems
> justified.
it possibly could be done in centralized manner in kernel when
device driver initializes DMA API, for example in
 dma_set_mask_and_coherent().
Even if it's done it would be regression if kernel's unable to
allocate bounce buffer on demand and device init fails were it were
working with preallocated SWIOTLB.


> 
> Thanks!
> Ben
> 
> On Mon, Sep 12, 2016 at 4:55 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> > On Sun, 28 Aug 2016 23:36:20 -0700
> > Benjamin Serebrin <serebrin@google.com> wrote:
> >  
> >> Thanks, all,
> >>
> >> The general view from last week is to pursue an ACPI table that
> >> indicates that the SWIOTLB isn't needed.  I'll work with our local
> >> ACPI experts on table format.  
> > Isn't SWIOTLB linux specific impl. detail?
> > Suppose guest is started without SWIOTLB and later user hotplugs
> > a device that not capable to handle high mem, what's then?
> >
> > Wouldn't it be better to make SWIOTLB created/allocated
> > on demand in kernel (i.e. presence of devices that require it)
> > instead of making hardware(hypervisor) to provide some obscure
> > ACPI table quirk to fix kernel issue?
> >  
> >>
> >> For existing guests, we'll work on language suggesting kernel command
> >> line options (iommu=off) if people are concerned, and will look into
> >> doing the command line setting in our own provided images.
> >>
> >> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@gmail.com> wrote:  
> >> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:  
> >> >> On 2016/8/24 22:36, Benjamin Serebrin wrote:  
> >> >>>
> >> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
> >> >>>
> >> >>> However, maintaining guests' kernel commandlines is something we'd
> >> >>> like to stay away from if possible.  It's certainly a short-term  
> >> >>
> >> >>
> >> >> I don't quite understand why stay away from kernel command line. It provides
> >> >> more flexibility, allowing you to turn on/off it by yourself.  
> >> >
> >> > I agree with Benjamin, it will result in customers have to tune their
> >> > guest OSes kernel command line or we supply guest images w/ kernel
> >> > command line modification.
> >> >
> >> > Regards,
> >> > Wanpeng Li
> >> >  
> >> >>
> >> >>  
> >> >>> answer, or something individual customers can choose to do today.  
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> >  
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SWIOTLB allocates unneeded 64 MB buffer in guests
  2016-09-13  9:47               ` Igor Mammedov
@ 2016-09-19 16:32                 ` Benjamin Serebrin
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin Serebrin @ 2016-09-19 16:32 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Wanpeng Li, Yang Zhang, kvm

On Tue, Sep 13, 2016 at 2:47 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> On Mon, 12 Sep 2016 10:14:55 -0700
> Benjamin Serebrin <serebrin@google.com> wrote:
>
>> Sure, SWIOTLB is linux-specific but general bounce buffering isn't.
>>
>> The idea is that the ACPI bit promises that the guest will not ever
>> need [SWIOTLB] bounce buffering.  That means either no hotplugging at
>> all, or no hotplugging of high-mem-incapable devices.  If our VMM ever
>> _adds_ a device to its catalog that's capable of hotplug but not
>> highmem, we'll clear the ACPI bit, for example.  I'm happy to discuss
>> and iterate over what promises are made by the ACPI bit if you'd like.
> Implications of above is that you effectively push kernel's iommu=off
> option up the stack where it would have to be configured to disable
> hotplug (which is for example enabled by default in QEMU).
> Also every existing/future device has to be modified to provide
> highmem-cap property so that emulator/firmware could decide if
> above ACPI table is necessary. It doable if an emulator generates
> ACPI tables but close to impossible (via standard interfaces)
> if it's firmware's job.
>
> If hotplug is allowed by default and SWIOTLB ACPI table is generated
> at boot if there aren't any low mem devices at boot,
> then one'd need fix kernel to try dynamically allocate SWIOTLB and
> fail high-mem-incapable device hotplug if it's unable to do so.
>
> Trying to save 64Mb out of more than 4Gb memory at above cost seems
> a little bit excessive.

I don't recommend such complexity; I was proposing a hint bit in ACPI
as a simple promise from the hypervisor.

64MB is 1.5% of a 4GB machine.  We wanted an easy way to give it back
to the guest.

>
> Another question:
> why don't run emulator with emulated IOMMU enabled? Then linux uses
> real IOMMU dma_ops (intel/amd) and 64Mb for SWIOTLB are not wasted/freed
> while keeping 32-bit devices operational?
> Last time I tested it, it works just fine either for coldplug and
> hotplug cases without need to mess with emulators nor any hardware
> to provide SWIOTLB ACPI table.
>

IOMMU comes with its own overheads; for example, until kernel v4.7,
where the speedup in the intel IOMMU ops was merged, guest
intel-iommu.c code has significant performance scalability issues.  I
would be more willing to try to get distros to backport a simple
no-SWIOTLB change than the fairly-invasive IOMMU optimizations.  We'll
be living with many pre-4.7 guests for quite a while.

>
>> The problem with dynamic allocation of the bounce buffer is that the
>> SWIOTLB code seems to demand contiguous low memory, and allocating
>> contiguous memory after boot is never guaranteed because of
>> fragmentation and subsequent pinning.  The original code seems to be
>> motivated by this: it does an early allocation of a contiguous low mem
>> and then a late deallocation if it determines that SWIOTLB is not
>> needed.  I imagine they wanted to cover cases where some high
>> mem-incapable device needed a contiguous target buffer because it had
>> no (or insufficient) scatter/gather capability.
>>
>> One could tie hot plug of a bounce-buffer-requiring virtual device to
>> causing SWIOTLB allocation, and fail the device initialization if the
>> required buffer couldn't be allocated.  I don't know of any new
>> virtual devices that require that, though, as high-mem-incapability is
>> hopefully only a vestige of very old virtual or real devices.  And the
>> plumbing complexity for doing this is much higher than seems
>> justified.
> it possibly could be done in centralized manner in kernel when
> device driver initializes DMA API, for example in
>  dma_set_mask_and_coherent().
> Even if it's done it would be regression if kernel's unable to
> allocate bounce buffer on demand and device init fails were it were
> working with preallocated SWIOTLB.
>
>
>>
>> Thanks!
>> Ben
>>
>> On Mon, Sep 12, 2016 at 4:55 AM, Igor Mammedov <imammedo@redhat.com> wrote:
>> > On Sun, 28 Aug 2016 23:36:20 -0700
>> > Benjamin Serebrin <serebrin@google.com> wrote:
>> >
>> >> Thanks, all,
>> >>
>> >> The general view from last week is to pursue an ACPI table that
>> >> indicates that the SWIOTLB isn't needed.  I'll work with our local
>> >> ACPI experts on table format.
>> > Isn't SWIOTLB linux specific impl. detail?
>> > Suppose guest is started without SWIOTLB and later user hotplugs
>> > a device that not capable to handle high mem, what's then?
>> >
>> > Wouldn't it be better to make SWIOTLB created/allocated
>> > on demand in kernel (i.e. presence of devices that require it)
>> > instead of making hardware(hypervisor) to provide some obscure
>> > ACPI table quirk to fix kernel issue?
>> >
>> >>
>> >> For existing guests, we'll work on language suggesting kernel command
>> >> line options (iommu=off) if people are concerned, and will look into
>> >> doing the command line setting in our own provided images.
>> >>
>> >> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> >> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@gmail.com>:
>> >> >> On 2016/8/24 22:36, Benjamin Serebrin wrote:
>> >> >>>
>> >> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB.
>> >> >>>
>> >> >>> However, maintaining guests' kernel commandlines is something we'd
>> >> >>> like to stay away from if possible.  It's certainly a short-term
>> >> >>
>> >> >>
>> >> >> I don't quite understand why stay away from kernel command line. It provides
>> >> >> more flexibility, allowing you to turn on/off it by yourself.
>> >> >
>> >> > I agree with Benjamin, it will result in customers have to tune their
>> >> > guest OSes kernel command line or we supply guest images w/ kernel
>> >> > command line modification.
>> >> >
>> >> > Regards,
>> >> > Wanpeng Li
>> >> >
>> >> >>
>> >> >>
>> >> >>> answer, or something individual customers can choose to do today.
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-19 16:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22 23:45 SWIOTLB allocates unneeded 64 MB buffer in guests Benjamin Serebrin
2016-08-24  2:44 ` Yang Zhang
2016-08-24 14:36   ` Benjamin Serebrin
2016-08-26  1:16     ` Yang Zhang
2016-08-26  2:45       ` Wanpeng Li
2016-08-29  6:36         ` Benjamin Serebrin
2016-09-12 11:55           ` Igor Mammedov
2016-09-12 17:14             ` Benjamin Serebrin
2016-09-12 20:14               ` Paolo Bonzini
2016-09-12 20:24                 ` Benjamin Serebrin
2016-09-13  9:47               ` Igor Mammedov
2016-09-19 16:32                 ` Benjamin Serebrin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.