All of lore.kernel.org
 help / color / mirror / Atom feed
* swiotlb detection should be memory hotplug aware ?
@ 2010-03-13  2:07 Alok Kataria
  2010-03-13  3:09 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Alok Kataria @ 2010-03-13  2:07 UTC (permalink / raw)
  To: Len Brown, Andi Kleen, the arch/x86 maintainers
  Cc: linux-acpi, LKML, Petr Vandrovec

Hi, 

Looking at the current code swiotlb is initialized for 64bit kernels
only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value). 
So in cases when the initial memory is less than 4GB the kernel boots
without enabling swiotlb, when we hotadd memory to such a kernel and go
beyond the 4G limit, swiotlb is still disabled. As a result when any
32bit devices start using this newly added memory beyond 4G, the kernel
starts spitting error messages like below or in some cases it causes
kernel panics.

<3>[ 815.921504] nommu_map_sg: overflow 32ffd6000+4096 of device mask ffffffff
<3>[ 815.944860] nommu_map_sg: overflow 32ffd6000+4096 of device mask ffffffff
<3>[ 815.968808] nommu_map_sg: overflow 32ffd6000+4096 of device mask ffffffff
<3>[ 815.992821] nommu_map_sg: overflow 32ffd6000+4096 of device mask ffffffff
<3>[ 816.016796] nommu_map_sg: overflow 32ffd6000+4096 of device mask ffffffff

For systems which have no HW-IOMMU but are capable of memory hotadd this
can be a potential problem. IMO, there can be few possible solutions to
this.

1. Enable swiotlb for all 64bit kernels which have memory hot-add
support.
2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
for max_hotpluggable_pfn (or some such) value. Though I don't see such a
value readily available. I could parse the SRAT and get hotplug memory
information but that will make swiotlb detection logic a little too
complex. A quick look around srat_xx.c files and the acpi_memhotplug
module didn't find any useful API that could be used directly either.
So was wondering if any of you are aware of an easy way to get such
information ? 

Let me know if you have any other ideas as well.

Thanks in advance, 
Alok


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-03-13  2:07 swiotlb detection should be memory hotplug aware ? Alok Kataria
@ 2010-03-13  3:09 ` Andi Kleen
  2010-03-15 17:22   ` Alok Kataria
  2010-03-16  0:51   ` [LKML] " Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2010-03-13  3:09 UTC (permalink / raw)
  To: akataria
  Cc: Len Brown, the arch/x86 maintainers, linux-acpi, LKML, Petr Vandrovec

, Alok Kataria wrote:

Hi Alok,

> Hi,
>
> Looking at the current code swiotlb is initialized for 64bit kernels
> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> So in cases when the initial memory is less than 4GB the kernel boots
> without enabling swiotlb, when we hotadd memory to such a kernel and go
> beyond the 4G limit, swiotlb is still disabled. As a result when any
> 32bit devices start using this newly added memory beyond 4G, the kernel
> starts spitting error messages like below or in some cases it causes
> kernel panics.

Yes seems like a real problem.

>
> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> support.

I don't think that's a good idea. It would enable it everywhere on
distributions which compile with hotadd. Need (2)

> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> value readily available. I could parse the SRAT and get hotplug memory
> information but that will make swiotlb detection logic a little too
> complex. A quick look around srat_xx.c files and the acpi_memhotplug
> module didn't find any useful API that could be used directly either.
> So was wondering if any of you are aware of an easy way to get such
> information ?

I have a patchkit to revamp the SRAT parsing to store the hotadd information
more efficiently (the current way is pretty dumb)  I need to repost that.

With that it would be relatively easy to do I think.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-03-13  3:09 ` Andi Kleen
@ 2010-03-15 17:22   ` Alok Kataria
  2010-03-16  0:51   ` [LKML] " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 20+ messages in thread
From: Alok Kataria @ 2010-03-15 17:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Len Brown, the arch/x86 maintainers, linux-acpi, LKML, Petr Vandrovec

Hi Andi, 

On Fri, 2010-03-12 at 19:09 -0800, Andi Kleen wrote:
> , Alok Kataria wrote:
> 
> Hi Alok,
> 
> > Hi,
> >
> > Looking at the current code swiotlb is initialized for 64bit kernels
> > only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> > So in cases when the initial memory is less than 4GB the kernel boots
> > without enabling swiotlb, when we hotadd memory to such a kernel and go
> > beyond the 4G limit, swiotlb is still disabled. As a result when any
> > 32bit devices start using this newly added memory beyond 4G, the kernel
> > starts spitting error messages like below or in some cases it causes
> > kernel panics.
> 
> Yes seems like a real problem.
> 
> >
> > 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> > support.
> 
> I don't think that's a good idea. It would enable it everywhere on
> distributions which compile with hotadd. Need (2)
> 
> > 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> > for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> > value readily available. I could parse the SRAT and get hotplug memory
> > information but that will make swiotlb detection logic a little too
> > complex. A quick look around srat_xx.c files and the acpi_memhotplug
> > module didn't find any useful API that could be used directly either.
> > So was wondering if any of you are aware of an easy way to get such
> > information ?
> 
> I have a patchkit to revamp the SRAT parsing to store the hotadd information
> more efficiently (the current way is pretty dumb)  I need to repost that.

Can you please send me any pointers to that patch series. Also am just
curious to know about any merge plans for that patchkit.

Thanks,
Alok
> 
> With that it would be relatively easy to do I think.
> 
> -Andi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [LKML] Re: swiotlb detection should be memory hotplug aware ?
  2010-03-13  3:09 ` Andi Kleen
  2010-03-15 17:22   ` Alok Kataria
@ 2010-03-16  0:51   ` Konrad Rzeszutek Wilk
  2010-03-16  1:33     ` FUJITA Tomonori
  1 sibling, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-16  0:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akataria, Len Brown, the arch/x86 maintainers, linux-acpi, LKML,
	Petr Vandrovec

On Fri, Mar 12, 2010 at 07:09:41PM -0800, Andi Kleen wrote:
> , Alok Kataria wrote:
>
> Hi Alok,
>
>> Hi,
>>
>> Looking at the current code swiotlb is initialized for 64bit kernels
>> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
>> So in cases when the initial memory is less than 4GB the kernel boots
>> without enabling swiotlb, when we hotadd memory to such a kernel and go
>> beyond the 4G limit, swiotlb is still disabled. As a result when any
>> 32bit devices start using this newly added memory beyond 4G, the kernel
>> starts spitting error messages like below or in some cases it causes
>> kernel panics.
>
> Yes seems like a real problem.
>
>>
>> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
>> support.
>
> I don't think that's a good idea. It would enable it everywhere on
> distributions which compile with hotadd. Need (2)
>
>> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
>> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
>> value readily available. I could parse the SRAT and get hotplug memory
>> information but that will make swiotlb detection logic a little too
>> complex. A quick look around srat_xx.c files and the acpi_memhotplug
>> module didn't find any useful API that could be used directly either.
>> So was wondering if any of you are aware of an easy way to get such
>> information ?
>
> I have a patchkit to revamp the SRAT parsing to store the hotadd information

There is a late mechanism to do kickoff the SWIOTLB. Perhaps the hot-add
could use swiotlb_init_late and start up the SWIOTLB?

> more efficiently (the current way is pretty dumb)  I need to repost that.
>
> With that it would be relatively easy to do I think.
>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [LKML] Re: swiotlb detection should be memory hotplug aware ?
  2010-03-16  0:51   ` [LKML] " Konrad Rzeszutek Wilk
@ 2010-03-16  1:33     ` FUJITA Tomonori
  2010-03-16 12:45       ` [LKML] " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: FUJITA Tomonori @ 2010-03-16  1:33 UTC (permalink / raw)
  To: konrad.wilk; +Cc: ak, akataria, lenb, x86, linux-acpi, linux-kernel, petr

On Mon, 15 Mar 2010 20:51:40 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Fri, Mar 12, 2010 at 07:09:41PM -0800, Andi Kleen wrote:
> > , Alok Kataria wrote:
> >
> > Hi Alok,
> >
> >> Hi,
> >>
> >> Looking at the current code swiotlb is initialized for 64bit kernels
> >> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> >> So in cases when the initial memory is less than 4GB the kernel boots
> >> without enabling swiotlb, when we hotadd memory to such a kernel and go
> >> beyond the 4G limit, swiotlb is still disabled. As a result when any
> >> 32bit devices start using this newly added memory beyond 4G, the kernel
> >> starts spitting error messages like below or in some cases it causes
> >> kernel panics.
> >
> > Yes seems like a real problem.
> >
> >>
> >> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> >> support.
> >
> > I don't think that's a good idea. It would enable it everywhere on
> > distributions which compile with hotadd. Need (2)
> >
> >> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> >> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> >> value readily available. I could parse the SRAT and get hotplug memory
> >> information but that will make swiotlb detection logic a little too
> >> complex. A quick look around srat_xx.c files and the acpi_memhotplug
> >> module didn't find any useful API that could be used directly either.
> >> So was wondering if any of you are aware of an easy way to get such
> >> information ?
> >
> > I have a patchkit to revamp the SRAT parsing to store the hotadd information
> 
> There is a late mechanism to do kickoff the SWIOTLB. Perhaps the hot-add
> could use swiotlb_init_late and start up the SWIOTLB?

I guess that you are talking about
swiotlb_late_init_with_default_size(), which IA64 uses. However, you
can use swiotlb_late_init_with_default_size() only before we
initialize devices. Making it work after initializing devices is not
so easy, I think (that is, we need to change dma_ops).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [LKML] Re: [LKML] Re: swiotlb detection should be memory hotplug aware ?
  2010-03-16  1:33     ` FUJITA Tomonori
@ 2010-03-16 12:45       ` Konrad Rzeszutek Wilk
  2010-03-17 22:48         ` Alok Kataria
  0 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-16 12:45 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: ak, akataria, lenb, x86, linux-acpi, linux-kernel, petr

On Tue, Mar 16, 2010 at 10:33:20AM +0900, FUJITA Tomonori wrote:
> On Mon, 15 Mar 2010 20:51:40 -0400
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> > On Fri, Mar 12, 2010 at 07:09:41PM -0800, Andi Kleen wrote:
> > > , Alok Kataria wrote:
> > >
> > > Hi Alok,
> > >
> > >> Hi,
> > >>
> > >> Looking at the current code swiotlb is initialized for 64bit kernels
> > >> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> > >> So in cases when the initial memory is less than 4GB the kernel boots
> > >> without enabling swiotlb, when we hotadd memory to such a kernel and go
> > >> beyond the 4G limit, swiotlb is still disabled. As a result when any
> > >> 32bit devices start using this newly added memory beyond 4G, the kernel
> > >> starts spitting error messages like below or in some cases it causes
> > >> kernel panics.
> > >
> > > Yes seems like a real problem.
> > >
> > >>
> > >> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> > >> support.
> > >
> > > I don't think that's a good idea. It would enable it everywhere on
> > > distributions which compile with hotadd. Need (2)
> > >
> > >> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> > >> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> > >> value readily available. I could parse the SRAT and get hotplug memory
> > >> information but that will make swiotlb detection logic a little too
> > >> complex. A quick look around srat_xx.c files and the acpi_memhotplug
> > >> module didn't find any useful API that could be used directly either.
> > >> So was wondering if any of you are aware of an easy way to get such
> > >> information ?
> > >
> > > I have a patchkit to revamp the SRAT parsing to store the hotadd information
> > 
> > There is a late mechanism to do kickoff the SWIOTLB. Perhaps the hot-add
> > could use swiotlb_init_late and start up the SWIOTLB?
> 
> I guess that you are talking about
> swiotlb_late_init_with_default_size(), which IA64 uses. However, you
> can use swiotlb_late_init_with_default_size() only before we
> initialize devices. Making it work after initializing devices is not
> so easy, I think (that is, we need to change dma_ops).

That is a good point. Especially if we have some outstanding DMA pages
allocated via dma_alloc_coherent.

I thought that the machines that have hot-add memory they have their
own fancy IOMMU. For example the IBM x3955 (and its family) utilize the
Calgary IOMMU. The HP boxes utilize the Intel VT-D (or the AMD
equivalant).

So is this mostly specialized in the areas of virtualized guests? (Xen
PV guests with PCI passthrough suffer the same problem, btw).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [LKML] Re: [LKML] Re: swiotlb detection should be memory hotplug aware ?
  2010-03-16 12:45       ` [LKML] " Konrad Rzeszutek Wilk
@ 2010-03-17 22:48         ` Alok Kataria
  2010-07-20 22:14           ` Alok Kataria
  0 siblings, 1 reply; 20+ messages in thread
From: Alok Kataria @ 2010-03-17 22:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: FUJITA Tomonori, ak, lenb, x86, linux-acpi, linux-kernel, Petr Vandrovec


On Tue, 2010-03-16 at 05:45 -0700, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 16, 2010 at 10:33:20AM +0900, FUJITA Tomonori wrote:
> > On Mon, 15 Mar 2010 20:51:40 -0400
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > 
> > > On Fri, Mar 12, 2010 at 07:09:41PM -0800, Andi Kleen wrote:
> > > > , Alok Kataria wrote:
> > > >
> > > > Hi Alok,
> > > >
> > > >> Hi,
> > > >>
> > > >> Looking at the current code swiotlb is initialized for 64bit kernels
> > > >> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> > > >> So in cases when the initial memory is less than 4GB the kernel boots
> > > >> without enabling swiotlb, when we hotadd memory to such a kernel and go
> > > >> beyond the 4G limit, swiotlb is still disabled. As a result when any
> > > >> 32bit devices start using this newly added memory beyond 4G, the kernel
> > > >> starts spitting error messages like below or in some cases it causes
> > > >> kernel panics.
> > > >
> > > > Yes seems like a real problem.
> > > >
> > > >>
> > > >> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> > > >> support.
> > > >
> > > > I don't think that's a good idea. It would enable it everywhere on
> > > > distributions which compile with hotadd. Need (2)
> > > >
> > > >> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> > > >> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> > > >> value readily available. I could parse the SRAT and get hotplug memory
> > > >> information but that will make swiotlb detection logic a little too
> > > >> complex. A quick look around srat_xx.c files and the acpi_memhotplug
> > > >> module didn't find any useful API that could be used directly either.
> > > >> So was wondering if any of you are aware of an easy way to get such
> > > >> information ?
> > > >
> > > > I have a patchkit to revamp the SRAT parsing to store the hotadd information
> > > 

Andi...ping any pointers to the patchkit. 

> > > There is a late mechanism to do kickoff the SWIOTLB. Perhaps the hot-add
> > > could use swiotlb_init_late and start up the SWIOTLB?

I don't see why we need to do this via late_init, swiotlb detection that
happens through pci_swiotlb_detect, is already late enough that SRAT is
already parsed. Or am I missing something ?
> > 
> > I guess that you are talking about
> > swiotlb_late_init_with_default_size(), which IA64 uses. However, you
> > can use swiotlb_late_init_with_default_size() only before we
> > initialize devices. Making it work after initializing devices is not
> > so easy, I think (that is, we need to change dma_ops).

> That is a good point. Especially if we have some outstanding DMA pages
> allocated via dma_alloc_coherent.
> 
> I thought that the machines that have hot-add memory they have their
> own fancy IOMMU. For example the IBM x3955 (and its family) utilize the
> Calgary IOMMU. The HP boxes utilize the Intel VT-D (or the AMD
> equivalant).
> So is this mostly specialized in the areas of virtualized guests? (Xen
> PV guests with PCI passthrough suffer the same problem, btw).


I am assuming that there were Intel based servers which supported memory
hot-add before VT-d too. So, IMO this is not specialized to
virtualization, though might be hard to prove if there are actual
physical machines out there which have similar constraints (no HWIOMMU +
MEMHOT add support)

Thanks,
Alok


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-03-17 22:48         ` Alok Kataria
@ 2010-07-20 22:14           ` Alok Kataria
  2010-07-21  4:58             ` FUJITA Tomonori
  0 siblings, 1 reply; 20+ messages in thread
From: Alok Kataria @ 2010-07-20 22:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: FUJITA Tomonori, ak, lenb, x86, linux-acpi, linux-kernel, Petr Vandrovec

Hi, 

Reviving a 4 month old thread. 
I am still waiting for any clues on this question below. 

>> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
>> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
>> value readily available. I could parse the SRAT and get hotplug memory
>> information but that will make swiotlb detection logic a little too
>> complex. A quick look around srat_xx.c files and the acpi_memhotplug
>> module didn't find any useful API that could be used directly either.
>> So was wondering if any of you are aware of an easy way to get such
>> information ?

Thanks,
Alok

On Wed, 2010-03-17 at 15:48 -0700, Alok Kataria wrote:
> On Tue, 2010-03-16 at 05:45 -0700, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 16, 2010 at 10:33:20AM +0900, FUJITA Tomonori wrote:
> > > On Mon, 15 Mar 2010 20:51:40 -0400
> > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > > 
> > > > On Fri, Mar 12, 2010 at 07:09:41PM -0800, Andi Kleen wrote:
> > > > > , Alok Kataria wrote:
> > > > >
> > > > > Hi Alok,
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> Looking at the current code swiotlb is initialized for 64bit kernels
> > > > >> only when the max_pfn value is greater than 4G (MAX_DMA32_PFN value).
> > > > >> So in cases when the initial memory is less than 4GB the kernel boots
> > > > >> without enabling swiotlb, when we hotadd memory to such a kernel and go
> > > > >> beyond the 4G limit, swiotlb is still disabled. As a result when any
> > > > >> 32bit devices start using this newly added memory beyond 4G, the kernel
> > > > >> starts spitting error messages like below or in some cases it causes
> > > > >> kernel panics.
> > > > >
> > > > > Yes seems like a real problem.
> > > > >
> > > > >>
> > > > >> 1. Enable swiotlb for all 64bit kernels which have memory hot-add
> > > > >> support.
> > > > >
> > > > > I don't think that's a good idea. It would enable it everywhere on
> > > > > distributions which compile with hotadd. Need (2)
> > > > >
> > > > >> 2. Instead of checking the max_pfn value in pci_swiotlb_detect, check
> > > > >> for max_hotpluggable_pfn (or some such) value. Though I don't see such a
> > > > >> value readily available. I could parse the SRAT and get hotplug memory
> > > > >> information but that will make swiotlb detection logic a little too
> > > > >> complex. A quick look around srat_xx.c files and the acpi_memhotplug
> > > > >> module didn't find any useful API that could be used directly either.
> > > > >> So was wondering if any of you are aware of an easy way to get such
> > > > >> information ?
> > > > >
> > > > > I have a patchkit to revamp the SRAT parsing to store the hotadd information
> > > > 
> 
> Andi...ping any pointers to the patchkit. 

> > > > There is a late mechanism to do kickoff the SWIOTLB. Perhaps the hot-add
> > > > could use swiotlb_init_late and start up the SWIOTLB?
> 
> I don't see why we need to do this via late_init, swiotlb detection that
> happens through pci_swiotlb_detect, is already late enough that SRAT is
> already parsed. Or am I missing something ?
> > > 
> > > I guess that you are talking about
> > > swiotlb_late_init_with_default_size(), which IA64 uses. However, you
> > > can use swiotlb_late_init_with_default_size() only before we
> > > initialize devices. Making it work after initializing devices is not
> > > so easy, I think (that is, we need to change dma_ops).
> 
> > That is a good point. Especially if we have some outstanding DMA pages
> > allocated via dma_alloc_coherent.
> > 
> > I thought that the machines that have hot-add memory they have their
> > own fancy IOMMU. For example the IBM x3955 (and its family) utilize the
> > Calgary IOMMU. The HP boxes utilize the Intel VT-D (or the AMD
> > equivalant).
> > So is this mostly specialized in the areas of virtualized guests? (Xen
> > PV guests with PCI passthrough suffer the same problem, btw).
> 
> 
> I am assuming that there were Intel based servers which supported memory
> hot-add before VT-d too. So, IMO this is not specialized to
> virtualization, though might be hard to prove if there are actual
> physical machines out there which have similar constraints (no HWIOMMU +
> MEMHOT add support)
> 
> Thanks,
> Alok


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-20 22:14           ` Alok Kataria
@ 2010-07-21  4:58             ` FUJITA Tomonori
  2010-07-21 17:13               ` Alok Kataria
  0 siblings, 1 reply; 20+ messages in thread
From: FUJITA Tomonori @ 2010-07-21  4:58 UTC (permalink / raw)
  To: akataria
  Cc: konrad.wilk, fujita.tomonori, ak, lenb, x86, linux-acpi,
	linux-kernel, petr

On Tue, 20 Jul 2010 15:14:57 -0700
Alok Kataria <akataria@vmware.com> wrote:

> Reviving a 4 month old thread. 
> I am still waiting for any clues on this question below. 

Basically, you want to add hot-plug memory and enable swiotlb, right?

We can't start swiotlb reliably after a system starts.

See dma32_reserve_boatmen() and dma32_free_bootmem(). What we do is
reserving huge memory in DMA32 zone for swiotlb and releasing it if we
find that we don't need swiotlb. We can't find enough memory for
swiotlb in dma32 after a system starts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-21  4:58             ` FUJITA Tomonori
@ 2010-07-21 17:13               ` Alok Kataria
  2010-07-21 23:44                 ` FUJITA Tomonori
  0 siblings, 1 reply; 20+ messages in thread
From: Alok Kataria @ 2010-07-21 17:13 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: konrad.wilk, ak, lenb, x86, linux-acpi, linux-kernel, Petr Vandrovec


On Tue, 2010-07-20 at 21:58 -0700, FUJITA Tomonori wrote:
> On Tue, 20 Jul 2010 15:14:57 -0700
> Alok Kataria <akataria@vmware.com> wrote:
> 
> > Reviving a 4 month old thread. 
> > I am still waiting for any clues on this question below. 
> 
> Basically, you want to add hot-plug memory and enable swiotlb, right?

Not really, I am planning to do something like this, 

@@ -52,7 +52,7 @@ int __init pci_swiotlb_detect(void)
 
 	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
 #ifdef CONFIG_X86_64
-	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
+	if (!no_iommu && (max_pfn > MAX_DMA32_PFN || hotplug_possible()))
 		swiotlb = 1;
 #endif
 	if (swiotlb_force)

BUT, I don't know how that hotplug_possible function will look like or
if such an interface already exists in the kernel (my search didn't turn
up any) ?

IMO, it should be possible to go read the SRAT to see if this system has
support for hotplug memory and then enable swiotlb if it does.

Sounds right ?

Thanks,
Alok

> 
> We can't start swiotlb reliably after a system starts.
> 
> See dma32_reserve_boatmen() and dma32_free_bootmem(). What we do is
> reserving huge memory in DMA32 zone for swiotlb and releasing it if we
> find that we don't need swiotlb. We can't find enough memory for
> swiotlb in dma32 after a system starts.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-21 17:13               ` Alok Kataria
@ 2010-07-21 23:44                 ` FUJITA Tomonori
  2010-07-22  0:03                   ` FUJITA Tomonori
  0 siblings, 1 reply; 20+ messages in thread
From: FUJITA Tomonori @ 2010-07-21 23:44 UTC (permalink / raw)
  To: akataria
  Cc: fujita.tomonori, konrad.wilk, ak, lenb, x86, linux-acpi,
	linux-kernel, petr

On Wed, 21 Jul 2010 10:13:34 -0700
Alok Kataria <akataria@vmware.com> wrote:

> > Basically, you want to add hot-plug memory and enable swiotlb, right?
> 
> Not really, I am planning to do something like this, 
> 
> @@ -52,7 +52,7 @@ int __init pci_swiotlb_detect(void)
>  
>  	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
>  #ifdef CONFIG_X86_64
> -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> +	if (!no_iommu && (max_pfn > MAX_DMA32_PFN || hotplug_possible()))
>  		swiotlb = 1;

Always enable swiotlb with memory hotplug enabled? Wasting 64MB on a
x86_64 system with 128MB doesn't look to be a good idea. I don't think
that there is an easy solution for this issue though.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-21 23:44                 ` FUJITA Tomonori
@ 2010-07-22  0:03                   ` FUJITA Tomonori
  2010-07-22 18:34                     ` Alok Kataria
  0 siblings, 1 reply; 20+ messages in thread
From: FUJITA Tomonori @ 2010-07-22  0:03 UTC (permalink / raw)
  To: akataria; +Cc: konrad.wilk, ak, lenb, x86, linux-acpi, linux-kernel, petr

On Thu, 22 Jul 2010 08:44:42 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> On Wed, 21 Jul 2010 10:13:34 -0700
> Alok Kataria <akataria@vmware.com> wrote:
> 
> > > Basically, you want to add hot-plug memory and enable swiotlb, right?
> > 
> > Not really, I am planning to do something like this, 
> > 
> > @@ -52,7 +52,7 @@ int __init pci_swiotlb_detect(void)
> >  
> >  	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
> >  #ifdef CONFIG_X86_64
> > -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> > +	if (!no_iommu && (max_pfn > MAX_DMA32_PFN || hotplug_possible()))
> >  		swiotlb = 1;
> 
> Always enable swiotlb with memory hotplug enabled? Wasting 64MB on a
> x86_64 system with 128MB doesn't look to be a good idea. I don't think
> that there is an easy solution for this issue though.

btw, you need more work to enable switch on the fly.

You need to change the dma_ops pointer (see get_dma_ops()). It means
that you need to track outstanding dma operations per device, locking,
etc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-22  0:03                   ` FUJITA Tomonori
@ 2010-07-22 18:34                     ` Alok Kataria
  2010-07-23 14:22                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: Alok Kataria @ 2010-07-22 18:34 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: konrad.wilk, ak, lenb, x86, linux-acpi, linux-kernel, Petr Vandrovec

Hi,

On Wed, 2010-07-21 at 17:03 -0700, FUJITA Tomonori wrote:
> On Thu, 22 Jul 2010 08:44:42 +0900
> FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> 
> > On Wed, 21 Jul 2010 10:13:34 -0700
> > Alok Kataria <akataria@vmware.com> wrote:
> > 
> > > > Basically, you want to add hot-plug memory and enable swiotlb, right?
> > > 
> > > Not really, I am planning to do something like this, 
> > > 
> > > @@ -52,7 +52,7 @@ int __init pci_swiotlb_detect(void)
> > >  
> > >  	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
> > >  #ifdef CONFIG_X86_64
> > > -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> > > +	if (!no_iommu && (max_pfn > MAX_DMA32_PFN || hotplug_possible()))
> > >  		swiotlb = 1;
> > 
> > Always enable swiotlb with memory hotplug enabled?

yep though only on systems which have hotpluggable memory support.

>  Wasting 64MB on a
> > x86_64 system with 128MB doesn't look to be a good idea. I don't think
> > that there is an easy solution for this issue though.

Good now that you agree that, that's the only feasible solution, do you
have any suggestions for any interfaces that are available from SRAT for
implementing hotplug_possible ?

> 
> btw, you need more work to enable switch on the fly.
> 
> You need to change the dma_ops pointer (see get_dma_ops()). It means
> that you need to track outstanding dma operations per device, locking,
> etc.

Yeah though if we are doing this during swiotlb_init time i.e. at bootup
as suggested in the pseudo patch, we don't need to worry about all this,
right ? 

Thanks,
Alok


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-22 18:34                     ` Alok Kataria
@ 2010-07-23 14:22                       ` Konrad Rzeszutek Wilk
  2010-07-23 14:33                         ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-07-23 14:22 UTC (permalink / raw)
  To: Alok Kataria
  Cc: FUJITA Tomonori, ak, lenb, x86, linux-acpi, linux-kernel, Petr Vandrovec

On Thu, Jul 22, 2010 at 11:34:40AM -0700, Alok Kataria wrote:
> Hi,
> 
> On Wed, 2010-07-21 at 17:03 -0700, FUJITA Tomonori wrote:
> > On Thu, 22 Jul 2010 08:44:42 +0900
> > FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> > 
> > > On Wed, 21 Jul 2010 10:13:34 -0700
> > > Alok Kataria <akataria@vmware.com> wrote:
> > > 
> > > > > Basically, you want to add hot-plug memory and enable swiotlb, right?
> > > > 
> > > > Not really, I am planning to do something like this, 
> > > > 
> > > > @@ -52,7 +52,7 @@ int __init pci_swiotlb_detect(void)
> > > >  
> > > >  	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
> > > >  #ifdef CONFIG_X86_64
> > > > -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> > > > +	if (!no_iommu && (max_pfn > MAX_DMA32_PFN || hotplug_possible()))
> > > >  		swiotlb = 1;
> > > 
> > > Always enable swiotlb with memory hotplug enabled?
> 
> yep though only on systems which have hotpluggable memory support.

What machines are there that have hotplug support and no hardware IOMMU?
I know of the IBM ones - but they use the Calgary IOMMU.
> 
> >  Wasting 64MB on a
> > > x86_64 system with 128MB doesn't look to be a good idea. I don't think
> > > that there is an easy solution for this issue though.
> 
> Good now that you agree that, that's the only feasible solution, do you
> have any suggestions for any interfaces that are available from SRAT for
> implementing hotplug_possible ?

I thought SRAT has NUMA affinity information - so for example my AMD
desktop box has that, but it does not support hotplug capability.

I think first your 'hotplug_possible' code needs to be more specific -
not just check if SRAT exists, but also if there are swaths of memory
that are non-populated. It would also help if there was some indication
of whether the box truly does a hardware hotplug - is there a way to do
this?

> 
> > 
> > btw, you need more work to enable switch on the fly.
> > 
> > You need to change the dma_ops pointer (see get_dma_ops()). It means
> > that you need to track outstanding dma operations per device, locking,
> > etc.
> 
> Yeah though if we are doing this during swiotlb_init time i.e. at bootup
> as suggested in the pseudo patch, we don't need to worry about all this,
> right ? 

Right..
> 
> Thanks,
> Alok

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-23 14:22                       ` Konrad Rzeszutek Wilk
@ 2010-07-23 14:33                         ` Andi Kleen
  2010-07-23 14:59                           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2010-07-23 14:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alok Kataria, FUJITA Tomonori, lenb, x86, linux-acpi,
	linux-kernel, Petr Vandrovec


> I thought SRAT has NUMA affinity information - so for example my AMD
> desktop box has that, but it does not support hotplug capability.
>
> I think first your 'hotplug_possible' code needs to be more specific -
> not just check if SRAT exists, but also if there are swaths of memory
> that are non-populated. It would also help if there was some indication
> of whether the box truly does a hardware hotplug - is there a way to do
> this?

The SRAT declares hotplug memory ranges in advance. And Linux already 
uses this
information in the SRAT parser (just the code for doing this is a bit 
dumb, I have a rewrite
somewhere)

The only drawback is that some older systems claimed to have large 
hotplug memory ranges
when they didn't actually support it. So it's better to not do anything 
with a lot
of overhead.

So yes it would be reasonable to let swiotlb (and possibly other code 
sizing itself
based on memory) call into the SRAT parser and check the hotplug ranges too.

BTW longer term swiotlb should be really more dynamic anyways and grow
and shrink on demand. I attempted this some time ago with my DMA 
allocator patchkit,
unfortunately that didn't go forward.

-Andi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-23 14:33                         ` Andi Kleen
@ 2010-07-23 14:59                           ` Konrad Rzeszutek Wilk
  2010-07-23 15:23                             ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-07-23 14:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alok Kataria, FUJITA Tomonori, lenb, x86, linux-acpi,
	linux-kernel, Petr Vandrovec

On Fri, Jul 23, 2010 at 04:33:32PM +0200, Andi Kleen wrote:
> 
> >I thought SRAT has NUMA affinity information - so for example my AMD
> >desktop box has that, but it does not support hotplug capability.
> >
> >I think first your 'hotplug_possible' code needs to be more specific -
> >not just check if SRAT exists, but also if there are swaths of memory
> >that are non-populated. It would also help if there was some indication
> >of whether the box truly does a hardware hotplug - is there a way to do
> >this?
> 
> The SRAT declares hotplug memory ranges in advance. And Linux
> already uses this
> information in the SRAT parser (just the code for doing this is a
> bit dumb, I have a rewrite
> somewhere)
> 
> The only drawback is that some older systems claimed to have large
> hotplug memory ranges
> when they didn't actually support it. So it's better to not do
> anything with a lot
> of overhead.
> 
> So yes it would be reasonable to let swiotlb (and possibly other
> code sizing itself
> based on memory) call into the SRAT parser and check the hotplug ranges too.
> 
> BTW longer term swiotlb should be really more dynamic anyways and grow
> and shrink on demand. I attempted this some time ago with my DMA

I was thinking about this at some point. I think the first step is to 
make SWIOTLB use the debugfs to actually print out how much of its
buffers are used - and see if the 64MB is a good fit.

The shrinking part scares me - I think it might be more prudent to first
explore on how to grow it. The big problem looks to allocate a physical
contiguity set of pages. And I guess SWIOTLB would need to change from
using one big region to something of a pool system?

> allocator patchkit,
> unfortunately that didn't go forward.

I wasn't present at that time so I don't know what the issues were - you
wouldn't have a link to LKML for this?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-23 14:59                           ` Konrad Rzeszutek Wilk
@ 2010-07-23 15:23                             ` Andi Kleen
  2010-07-28 10:10                               ` FUJITA Tomonori
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2010-07-23 15:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alok Kataria, FUJITA Tomonori, lenb, x86, linux-acpi,
	linux-kernel, Petr Vandrovec


> I was thinking about this at some point. I think the first step is to
> make SWIOTLB use the debugfs to actually print out how much of its
> buffers are used - and see if the 64MB is a good fit.

swiotlb is near always wrongly sized. For most system it's far too much, 
but for some
not enough. I have some systemtap scripts around to instrument it.

Also it depends on the IO load, so if you size it reasonable you
risk overflow on large IO (however these days this very rarely happens 
because
all "serious" IO devices don't need swiotlb anymore)

The other problem is that using only  two bits for the needed address 
space is also extremly
inefficient (4GB and 16MB on x86). Really want masks everywhere and 
optimize for the
actual requirements.

> The shrinking part scares me - I think it might be more prudent to first
> explore on how to grow it. The big problem looks to allocate a physical
> contiguity set of pages. And I guess SWIOTLB would need to change from
> using one big region to something of a pool system?
>

Shrinking: you define a movable zone, so with some delay it can be 
always freed.

The problem with swiotlb is however it still cannot block, but it can 
adapt to load.

The real fix would be blockable swiotlb, but the way drivers are set up 
this is difficult
(at least in kernels using spinlocks)

>> allocator patchkit,
>> unfortunately that didn't go forward.
> I wasn't present at that time so I don't know what the issues were - you
> wouldn't have a link to LKML for this?


There wasn't all that much opposition, but I ran out of time because 
fixing the infrastructure
(e.g. getting rid of all of GFP_DMA) is a lot of work. In a sense it's a 
big tree sweep project like
getting rid of BKL.

The old patch kit is at ftp://firstfloor.org/pub/ak/dma/
"intro" has the rationale.

I have a slightly newer version of the SCSI & misc drivers patchkit 
somewhere.

-Andi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-23 15:23                             ` Andi Kleen
@ 2010-07-28 10:10                               ` FUJITA Tomonori
  2010-07-28 11:09                                 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: FUJITA Tomonori @ 2010-07-28 10:10 UTC (permalink / raw)
  To: ak
  Cc: konrad.wilk, akataria, fujita.tomonori, lenb, x86, linux-acpi,
	linux-kernel, petr

On Fri, 23 Jul 2010 17:23:33 +0200
Andi Kleen <ak@linux.intel.com> wrote:

> 
> > I was thinking about this at some point. I think the first step is to
> > make SWIOTLB use the debugfs to actually print out how much of its
> > buffers are used - and see if the 64MB is a good fit.
> 
> swiotlb is near always wrongly sized. For most system it's far too much, 
> but for some
> not enough. I have some systemtap scripts around to instrument it.

True, it's impossible to preallocate the best iotlb size statically.

 
> Also it depends on the IO load, so if you size it reasonable you
> risk overflow on large IO (however these days this very rarely happens 
> because
> all "serious" IO devices don't need swiotlb anymore)

Yeah, nowadays it's pointless to try to get the good performance with
swiotlb.


> The other problem is that using only  two bits for the needed address 
> space is also extremly
> inefficient (4GB and 16MB on x86). Really want masks everywhere and 
> optimize for the
> actual requirements.

swiotlb doesn't allocate GFP_DMA memory. It handles only GFP_DMA32.

swiotlb doesn't work for drivers with some odd dma mask (non 32bit)
but we have been lived with it so I don't think that it's a big issue.


I think, supporting expanding swiotlb dynamically is enough. The
default swiotlb size, 64MB is too large for majority.

I have a half-baked patch for it. I'll send it later.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-28 10:10                               ` FUJITA Tomonori
@ 2010-07-28 11:09                                 ` Andi Kleen
  2010-07-28 14:20                                   ` FUJITA Tomonori
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2010-07-28 11:09 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: ak, konrad.wilk, akataria, lenb, x86, linux-acpi, linux-kernel, petr

FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> writes:
>
>> The other problem is that using only  two bits for the needed address 
>> space is also extremly
>> inefficient (4GB and 16MB on x86). Really want masks everywhere and 
>> optimize for the
>> actual requirements.
>
> swiotlb doesn't allocate GFP_DMA memory. It handles only GFP_DMA32.

I was lumping GFP_DMA and swiotlb together here. The
pci_alloc_consistent() function uses both interchangedly.
They really effectively are the same thing these days
and just separated by historical accident.


> I have a half-baked patch for it. I'll send it later.

The problem are still the *_map users which usually cannot sleep,
and then it's difficult to grow.
For *_alloc it's relatively easy and to some extend already 
implemented.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: swiotlb detection should be memory hotplug aware ?
  2010-07-28 11:09                                 ` Andi Kleen
@ 2010-07-28 14:20                                   ` FUJITA Tomonori
  0 siblings, 0 replies; 20+ messages in thread
From: FUJITA Tomonori @ 2010-07-28 14:20 UTC (permalink / raw)
  To: andi
  Cc: fujita.tomonori, ak, konrad.wilk, akataria, lenb, x86,
	linux-acpi, linux-kernel, petr

On Wed, 28 Jul 2010 13:09:57 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> writes:
> >
> >> The other problem is that using only  two bits for the needed address 
> >> space is also extremly
> >> inefficient (4GB and 16MB on x86). Really want masks everywhere and 
> >> optimize for the
> >> actual requirements.
> >
> > swiotlb doesn't allocate GFP_DMA memory. It handles only GFP_DMA32.
> 
> I was lumping GFP_DMA and swiotlb together here. The
> pci_alloc_consistent() function uses both interchangedly.
> They really effectively are the same thing these days
> and just separated by historical accident.

Sorry, I meant to ZONE_DMA.

You are talking about your dma mask allocation patchset, right?

I meant that swiotlb doesn't need to handle ZONE_DMA. It handles only
devices that can handle ZONE_DMA32.


> > I have a half-baked patch for it. I'll send it later.
> 
> The problem are still the *_map users which usually cannot sleep,
> and then it's difficult to grow.

Why we can't use GFP_NOWAIT?

My approach is starting with small (like 4MB) and increasing io_tbl by
chunk such as 4MB.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-07-28 14:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-13  2:07 swiotlb detection should be memory hotplug aware ? Alok Kataria
2010-03-13  3:09 ` Andi Kleen
2010-03-15 17:22   ` Alok Kataria
2010-03-16  0:51   ` [LKML] " Konrad Rzeszutek Wilk
2010-03-16  1:33     ` FUJITA Tomonori
2010-03-16 12:45       ` [LKML] " Konrad Rzeszutek Wilk
2010-03-17 22:48         ` Alok Kataria
2010-07-20 22:14           ` Alok Kataria
2010-07-21  4:58             ` FUJITA Tomonori
2010-07-21 17:13               ` Alok Kataria
2010-07-21 23:44                 ` FUJITA Tomonori
2010-07-22  0:03                   ` FUJITA Tomonori
2010-07-22 18:34                     ` Alok Kataria
2010-07-23 14:22                       ` Konrad Rzeszutek Wilk
2010-07-23 14:33                         ` Andi Kleen
2010-07-23 14:59                           ` Konrad Rzeszutek Wilk
2010-07-23 15:23                             ` Andi Kleen
2010-07-28 10:10                               ` FUJITA Tomonori
2010-07-28 11:09                                 ` Andi Kleen
2010-07-28 14:20                                   ` FUJITA Tomonori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.