All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
       [not found] <8898674D84E3B24BA3A2D289B872026A69EECF90@G01JPEXMBKW03>
@ 2018-04-27 14:12 ` Marc Zyngier
  2018-04-30  7:53   ` Zhang, Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2018-04-27 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

[fixing up the address of the LAK mailing list]

Hi Lei,

On 27/04/18 14:20, Zhang, Lei wrote:
> I want to talk about the implementation of GIC-ITS driver.
> The current implementation of irq-gic-v3-its driver allocates at 
> least 32 LPIs (interrupt numbers) for each Device ID even if the number of
> requested LPIs is only one.
> I think it is a waste for LPI resource.

It depends what your use case is. The use-case the ITS driver is
designed for is that you have far more LPIs than you have devices, and
each device is going to use far more than a single LPI.

That's pretty much the case of all systems that use PCI, where each
function tends to allocate one queue (and thus one MSI) per CPU.

Another benefit of allocating 32 LPIs in one go is that it guarantees
the right allocation for PCI devices requiring Multi-MSI.

> And if we use many devices over ITS, this implementation may cause a
> shortage of LPI .

How many is many? Are they PCI? Or something else?

> I have a patch to avoid this problem by reducing the number of LPIs from 32 to 2 per chunk.
> The points of this patch are as follows.
> Point1:Each Device ID can still allocate enough LPIs by increasing chunk number.
> Point2:The size of memory for the bitmap (lpi_bitmap) to manage the free chunks becomes 
> larger(256B -> 4KiB).But I think it is not a problem,
> because the max size of lpi_bitmap is 4KiB.

As it is, this patch will break Multi-MSI and some other platforms that
do require a higher allocation granule.

Depending on whether you're using PCI or some other bus, we can probably
come up with a solution that works for everyone. But I need more
information on this.

Thanks,

	M.

> 
> Signed-off-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
> 
> The follow is my patch. the patch is based on kernel 4.16.5
> -------------------------------------
> --- drivers/irqchip/irq-gic-v3-its.c       2018-01-29 06:20:33.000000000 +0900
> +++ drivers/irqchip/irq-gic-v3-its.c    2018-04-25 15:05:26.078956980 +0900
> @@ -1406,12 +1406,12 @@
>   *
>   * The GIC has id_bits bits for interrupt identifiers. From there, we
>   * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then, as
> - * we allocate LPIs by chunks of 32, we can shift the whole thing by 5
> + * we allocate LPIs by chunks of 2, we can shift the whole thing by 1
>   * bits to the right.
>   *
>   * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
>   */
> -#define IRQS_PER_CHUNK_SHIFT   5
> +#define IRQS_PER_CHUNK_SHIFT   1
>  #define IRQS_PER_CHUNK         (1 << IRQS_PER_CHUNK_SHIFT)
>  #define ITS_MAX_LPI_NRBITS     16 /* 64K LPIs */
> 
> Regards,
> Lei Zhang
> --
> e-mail: zhang.lei at jp.fujitsu.com 
> 
> 


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-04-27 14:12 ` [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource Marc Zyngier
@ 2018-04-30  7:53   ` Zhang, Lei
  2018-04-30 10:20     ` Marc Zyngier
  2018-05-09 14:32     ` Mark Langsdorf
  0 siblings, 2 replies; 11+ messages in thread
From: Zhang, Lei @ 2018-04-30  7:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc

thanks for your opinions.

> How many is many? Are they PCI? Or something else?
 
> As it is, this patch will break Multi-MSI and some other platforms that
> do require a higher allocation granule.
> 
> Depending on whether you're using PCI or some other bus, we can probably
> come up with a solution that works for everyone. But I need more
> information on this.

Actually it is our original interconnect device not on PCI but on our original bus.
This device has many sub devices around one thousand, and each sub device requires only a few LPIs.

As I explained in point1, each Device ID can still allocate enough LPIs more than IRQS_PER_CHUNK by increasing chunks.
So I couldn't understand why this patch will break Multi-MSI.

By the way, 32 seems a good default value for IRQS_PER_CHUNK.
So, I want to write another patch to make IRQ_PER_CHUNK a variable which can be changed by kernel parameter.
What do you think of this idea?


> -----Original Message-----
> From: Marc Zyngier [mailto:marc.zyngier at arm.com]
> Sent: Friday, April 27, 2018 11:13 PM
> To: Zhang, Lei/? ?; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
> 
> [fixing up the address of the LAK mailing list]
> 
> Hi Lei,
> 
> On 27/04/18 14:20, Zhang, Lei wrote:
> > I want to talk about the implementation of GIC-ITS driver.
> > The current implementation of irq-gic-v3-its driver allocates at
> > least 32 LPIs (interrupt numbers) for each Device ID even if the number
> of
> > requested LPIs is only one.
> > I think it is a waste for LPI resource.
> 
> It depends what your use case is. The use-case the ITS driver is
> designed for is that you have far more LPIs than you have devices, and
> each device is going to use far more than a single LPI.
> 
> That's pretty much the case of all systems that use PCI, where each
> function tends to allocate one queue (and thus one MSI) per CPU.
> 
> Another benefit of allocating 32 LPIs in one go is that it guarantees
> the right allocation for PCI devices requiring Multi-MSI.
> 
> > And if we use many devices over ITS, this implementation may cause a
> > shortage of LPI .
> 
> How many is many? Are they PCI? Or something else?
> 
> > I have a patch to avoid this problem by reducing the number of LPIs
> from 32 to 2 per chunk.
> > The points of this patch are as follows.
> > Point1:Each Device ID can still allocate enough LPIs by increasing chunk
> number.
> > Point2:The size of memory for the bitmap (lpi_bitmap) to manage the
> free chunks becomes
> > larger(256B -> 4KiB).But I think it is not a problem,
> > because the max size of lpi_bitmap is 4KiB.
> 
> As it is, this patch will break Multi-MSI and some other platforms that
> do require a higher allocation granule.
> 
> Depending on whether you're using PCI or some other bus, we can probably
> come up with a solution that works for everyone. But I need more
> information on this.
> 
> Thanks,
> 
> 	M.
> 
> >
> > Signed-off-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
> >
> > The follow is my patch. the patch is based on kernel 4.16.5
> > -------------------------------------
> > --- drivers/irqchip/irq-gic-v3-its.c       2018-01-29
> 06:20:33.000000000 +0900
> > +++ drivers/irqchip/irq-gic-v3-its.c    2018-04-25
> 15:05:26.078956980 +0900
> > @@ -1406,12 +1406,12 @@
> >   *
> >   * The GIC has id_bits bits for interrupt identifiers. From there,
> we
> >   * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then,
> as
> > - * we allocate LPIs by chunks of 32, we can shift the whole thing by
> 5
> > + * we allocate LPIs by chunks of 2, we can shift the whole thing by
> 1
> >   * bits to the right.
> >   *
> >   * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
> >   */
> > -#define IRQS_PER_CHUNK_SHIFT   5
> > +#define IRQS_PER_CHUNK_SHIFT   1
> >  #define IRQS_PER_CHUNK         (1 << IRQS_PER_CHUNK_SHIFT)
> >  #define ITS_MAX_LPI_NRBITS     16 /* 64K LPIs */
> >
> > Regards,
> > Lei Zhang
> > --
> > e-mail: zhang.lei at jp.fujitsu.com
> >
> >
> 
> 
> --
> Jazz is not dead. It just smells funny...
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-04-30  7:53   ` Zhang, Lei
@ 2018-04-30 10:20     ` Marc Zyngier
  2018-05-03  6:46       ` Zhang, Lei
  2018-05-09 14:32     ` Mark Langsdorf
  1 sibling, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2018-04-30 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lei,

On 30/04/18 08:53, Zhang, Lei wrote:
> Hi Marc
> 
> thanks for your opinions.
> 
>> How many is many? Are they PCI? Or something else?
>  
>> As it is, this patch will break Multi-MSI and some other platforms that
>> do require a higher allocation granule.
>>
>> Depending on whether you're using PCI or some other bus, we can probably
>> come up with a solution that works for everyone. But I need more
>> information on this.
> 
> Actually it is our original interconnect device not on PCI but on our
> original bus. This device has many sub devices around one thousand,
> and each sub device requires only a few LPIs.

OK. Have you implemented your own glue layer for this (like we already
have for PCI, platform and FSL-MC)? Or are you directly using the
platform MSI infrastructure?

> As I explained in point1, each Device ID can still allocate enough
> LPIs more than IRQS_PER_CHUNK by increasing chunks. So I couldn't
> understand why this patch will break Multi-MSI.

Unfortunately, Multi-MSI has more requirements than just allocating
multiple MSIs. The allocated MSIs must be allocated as a contiguous
range, be a power of two, aligned on the size of the range, with a
maximum of 32 interrupts. Your approach breaks the alignment requirement.

> By the way, 32 seems a good default value for IRQS_PER_CHUNK.
> So, I want to write another patch to make IRQ_PER_CHUNK a variable
> which can be changed by kernel parameter.
> What do you think of this idea?

It is not great, because it prevents two buses with different
requirements from being used concurrently in the same system. It may not
be an issue for you right now, but I want the ITS driver to be
independent of the bus requirements.

Another, more involved (but also more correct) approach would be to
teach the various glue layers about the requirements of the bus they
support, and pass on the allocation requirements to the core ITS driver.
That way, we would keep the heterogeneous case working.

I can probably help you with that, but this assumes that you've
implemented support for your own bus by providing a glue layer
equivalent to the one we have for other buses.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-04-30 10:20     ` Marc Zyngier
@ 2018-05-03  6:46       ` Zhang, Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Lei @ 2018-05-03  6:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc

> I can probably help you with that, but this assumes that you've
> implemented support for your own bus by providing a glue layer
> equivalent to the one we have for other buses.
Thanks for your comments.
I want to consider implementing a glue layer for our bus. 

> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of
> Marc Zyngier
> Sent: Monday, April 30, 2018 7:20 PM
> To: Zhang, Lei/? ?; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
> 
> Hi Lei,
> 
> On 30/04/18 08:53, Zhang, Lei wrote:
> > Hi Marc
> >
> > thanks for your opinions.
> >
> >> How many is many? Are they PCI? Or something else?
> >
> >> As it is, this patch will break Multi-MSI and some other platforms
> that
> >> do require a higher allocation granule.
> >>
> >> Depending on whether you're using PCI or some other bus, we can probably
> >> come up with a solution that works for everyone. But I need more
> >> information on this.
> >
> > Actually it is our original interconnect device not on PCI but on our
> > original bus. This device has many sub devices around one thousand,
> > and each sub device requires only a few LPIs.
> 
> OK. Have you implemented your own glue layer for this (like we already
> have for PCI, platform and FSL-MC)? Or are you directly using the
> platform MSI infrastructure?
> 
> > As I explained in point1, each Device ID can still allocate enough
> > LPIs more than IRQS_PER_CHUNK by increasing chunks. So I couldn't
> > understand why this patch will break Multi-MSI.
> 
> Unfortunately, Multi-MSI has more requirements than just allocating
> multiple MSIs. The allocated MSIs must be allocated as a contiguous
> range, be a power of two, aligned on the size of the range, with a
> maximum of 32 interrupts. Your approach breaks the alignment requirement.
> 
> > By the way, 32 seems a good default value for IRQS_PER_CHUNK.
> > So, I want to write another patch to make IRQ_PER_CHUNK a variable
> > which can be changed by kernel parameter.
> > What do you think of this idea?
> 
> It is not great, because it prevents two buses with different
> requirements from being used concurrently in the same system. It may not
> be an issue for you right now, but I want the ITS driver to be
> independent of the bus requirements.
> 
> Another, more involved (but also more correct) approach would be to
> teach the various glue layers about the requirements of the bus they
> support, and pass on the allocation requirements to the core ITS driver.
> That way, we would keep the heterogeneous case working.
> 
> I can probably help you with that, but this assumes that you've
> implemented support for your own bus by providing a glue layer
> equivalent to the one we have for other buses.
> 
> Thanks,
> 
> 	M.
> --
> Jazz is not dead. It just smells funny...
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-04-30  7:53   ` Zhang, Lei
  2018-04-30 10:20     ` Marc Zyngier
@ 2018-05-09 14:32     ` Mark Langsdorf
  2018-05-10 13:09       ` Zhang, Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Mark Langsdorf @ 2018-05-09 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/30/2018 02:53 AM, Zhang, Lei wrote:
> Hi Marc
> 
> thanks for your opinions.
> 
>> How many is many? Are they PCI? Or something else?
>   
>> As it is, this patch will break Multi-MSI and some other platforms that
>> do require a higher allocation granule.
>>
>> Depending on whether you're using PCI or some other bus, we can probably
>> come up with a solution that works for everyone. But I need more
>> information on this.
> 
> Actually it is our original interconnect device not on PCI but on our original bus.
> This device has many sub devices around one thousand, and each sub device requires only a few LPIs.
> 
> As I explained in point1, each Device ID can still allocate enough LPIs more than IRQS_PER_CHUNK by increasing chunks.
> So I couldn't understand why this patch will break Multi-MSI.
> 
> By the way, 32 seems a good default value for IRQS_PER_CHUNK.
> So, I want to write another patch to make IRQ_PER_CHUNK a variable which can be changed by kernel parameter.
> What do you think of this idea?

If you intend your devices to be supported by server distributions,  
instead of embedded kernels, having IRQ_PER_CHUNK as a kernel parameter  
is not a good idea. The server distributions are going to compile, test,  
and support a single kernel for all server systems, and are not going to  
have a special kernel to support one vendor's systems that has a  
different IRQ_PER_CHUNK parameter.

So even if you wrote that patch and it was accepted into the kernel, the  
server distribution kernels are still going to set IRQ_PER_CHUNK to 32.

Marc's suggestion of implementing a glue layer, and then changing the  
glue layer interface to pass the allocation requirements up to this  
driver is the best solution that can be supported for server products.

--Mark Langsdorf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-05-09 14:32     ` Mark Langsdorf
@ 2018-05-10 13:09       ` Zhang, Lei
  2018-05-10 14:12         ` Marc Zyngier
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang, Lei @ 2018-05-10 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark, Marc

> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of
> Mark Langsdorf
> Sent: Wednesday, May 09, 2018 11:32 PM
> To: linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
> 
 
> Marc's suggestion of implementing a glue layer, and then changing the
> glue layer interface to pass the allocation requirements up to this
> driver is the best solution that can be supported for server products.

Thanks for your comments.
?
I'm considering implementing a glue layer. But the management of 
whole LPI resource is implemented by core ITS driver (irq-gic-v3-its.c),
So I think core ITS driver also need to be modified.
?
Below is my idea for core ITS driver. 
Would you give me comments?
?
Add an interface that can specify the number of LPIs allocation requirements on Core ITS driver.
My idea is extend its_msi_prepare function by adding two argument.

SYNOPSIS
its_msi_prepare(struct irq_domain *domain, struct device *dev,int nvec,
 msi_alloc_info_t *info,int request_nr_lpis, int request_lpis_align)
?
Argument "request_nr_lpis" means request LPIs total number for device. 
Argument "request_lpis_align" means request of LPIs alignment. 0 means do not specify alignment.

For PCI, PCI glue layer specifies, request_nr_lpis = 32, request_lpis_align = 32.
For our device, the glue layer specifies, request_nr_lpis = 1, request_lpis_align = 0.
(We have lots of device but each device only need a single LPI on our original bus. )
?

For expanding two arguments, core ITS driver maybe need to discard chunk mechanism 
and use bitmap to manage LPI resource directly.
?
Example:
 (1) Non PCI glue layer requires ?request_nr_lpis? = 1, ?request_lpis_align? = 0.
?
    Bitmap will become  ...0000000000000000001
?
 (2) PCI glue layer requires ?request_nr_lpis? = 32, ?request_lpis_align? = 32.
?
    Bitmap will become  ...000FFFFFFFF00000001
?
?
 (3) Non PCI glue layer requires ?request_nr_lpis? = 1, ?request_lpis_align? = 0.
?
    Bitmap will become  ...000FFFFFFFF00000003
?
I think it may be easy to implement by using bitmap_find_next_zero_area, because it has ?align_mask? argument.

Regards,
Lei Zhang
--
Lei Zhang  e-mail: zhang.lei at jp.fujitsu.com FUJITSU LIMITED

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-05-10 13:09       ` Zhang, Lei
@ 2018-05-10 14:12         ` Marc Zyngier
  2018-05-12  1:47           ` Zhang, Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2018-05-10 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lei,

On 10/05/18 14:09, Zhang, Lei wrote:
> Hi Mark, Marc
> 
>> -----Original Message-----
>> From: linux-arm-kernel
>> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of
>> Mark Langsdorf
>> Sent: Wednesday, May 09, 2018 11:32 PM
>> To: linux-arm-kernel at lists.infradead.org
>> Subject: Re: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
>>
>  
>> Marc's suggestion of implementing a glue layer, and then changing the
>> glue layer interface to pass the allocation requirements up to this
>> driver is the best solution that can be supported for server products.
> 
> Thanks for your comments.
> ?
> I'm considering implementing a glue layer. But the management of 
> whole LPI resource is implemented by core ITS driver (irq-gic-v3-its.c),
> So I think core ITS driver also need to be modified.
> ?
> Below is my idea for core ITS driver. 
> Would you give me comments?
> ?
> Add an interface that can specify the number of LPIs allocation requirements on Core ITS driver.
> My idea is extend its_msi_prepare function by adding two argument.
> 
> SYNOPSIS
> its_msi_prepare(struct irq_domain *domain, struct device *dev,int nvec,
>  msi_alloc_info_t *info,int request_nr_lpis, int request_lpis_align)
> ?
> Argument "request_nr_lpis" means request LPIs total number for device. 

I do not believe this is required. nvec already describes the number of
LPIs that are expected by the device. Why should we add a new parameter
that looks to be the exact same thing?

> Argument "request_lpis_align" means request of LPIs alignment. 0 means do not specify alignment.

Another problem is that this approach isn't really practical.
msi_prepare is part of a generic structure, and changing it will impact
all the others users of that structure. I'd rather you don't change its
prototype for something that is implementation specific.

Instead, we have the msi_alloc_info_t structure, which is explicitly
designed to collate information pertaining to the allocation
requirements in an implementation agnostic way.

At the moment, the ITS code only uses the first entry of the scratchpad
area to store the DeviceID (and subsequently a pointer to the
corresponding its_device).

You could use the second entry to encode the alignment requirement.

> For PCI, PCI glue layer specifies, request_nr_lpis = 32, request_lpis_align = 32.
> For our device, the glue layer specifies, request_nr_lpis = 1, request_lpis_align = 0.
> (We have lots of device but each device only need a single LPI on our original bus. )
> ?
> 
> For expanding two arguments, core ITS driver maybe need to discard chunk mechanism 
> and use bitmap to manage LPI resource directly.
> ?
> Example:
>  (1) Non PCI glue layer requires ?request_nr_lpis? = 1, ?request_lpis_align? = 0.
> ?
>     Bitmap will become  ...0000000000000000001
> ?
>  (2) PCI glue layer requires ?request_nr_lpis? = 32, ?request_lpis_align? = 32.
> ?
>     Bitmap will become  ...000FFFFFFFF00000001
> ?
> ?
>  (3) Non PCI glue layer requires ?request_nr_lpis? = 1, ?request_lpis_align? = 0.
> ?
>     Bitmap will become  ...000FFFFFFFF00000003
> ?> I think it may be easy to implement by using
> bitmap_find_next_zero_area, because it has ?align_mask? argument.

The problem is that you now increase the footprint of the bitmap by a
factor of 32:

- For a system with 16bit INTIDs, you go from 256 bytes to 8kB. That's
bad, but OK, fine.

- For a system with 32bit INTIDs, you go from 16MB (which was already
horrible) to 512MB. That's unacceptable.

At that point, using a simple bitmap doesn't work anymore, and we're
better off switching to a free list of some sort that would keep the
memory overhead to a minimum (and actually radically reduce the
footprint even in the smallest configurations).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-05-10 14:12         ` Marc Zyngier
@ 2018-05-12  1:47           ` Zhang, Lei
  2018-05-18  9:49             ` Zhang, Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang, Lei @ 2018-05-12  1:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc
> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Marc
> Zyngier
> Sent: Thursday, May 10, 2018 11:12 PM
> To: Zhang, Lei/? ?; 'Mark Langsdorf'; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
for device.
> 
> I do not believe this is required. nvec already describes the number of
> LPIs that are expected by the device. Why should we add a new parameter
> that looks to be the exact same thing?
yes. As you referred, it seems not really required. I will delete this parameter. 

> 
> Another problem is that this approach isn't really practical.
> msi_prepare is part of a generic structure, and changing it will impact
> all the others users of that structure. I'd rather you don't change its
> prototype for something that is implementation specific.
> 
> Instead, we have the msi_alloc_info_t structure, which is explicitly
> designed to collate information pertaining to the allocation
> requirements in an implementation agnostic way.
> At the moment, the ITS code only uses the first entry of the scratchpad
> area to store the DeviceID (and subsequently a pointer to the
> corresponding its_device).
> 
> You could use the second entry to encode the alignment requirement.
I think it is a good idea for compatibility. I accept your suggestion.
 

> The problem is that you now increase the footprint of the bitmap by a
> factor of 32:
> 
> - For a system with 16bit INTIDs, you go from 256 bytes to 8kB. That's
> bad, but OK, fine.
> 
> - For a system with 32bit INTIDs, you go from 16MB (which was already
> horrible) to 512MB. That's unacceptable.
> 
> At that point, using a simple bitmap doesn't work anymore, and we're
> better off switching to a free list of some sort that would keep the
> memory overhead to a minimum (and actually radically reduce the
> footprint even in the smallest configurations).
I will reconsider how to management LPI with small
memory overhead as possible as I can. 

By the way I will finish my patch in the next week.

Regards,
Lei Zhang
--
Lei Zhang  e-mail: zhang.lei at jp.fujitsu.com FUJITSU LIMITED

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-05-12  1:47           ` Zhang, Lei
@ 2018-05-18  9:49             ` Zhang, Lei
  2018-05-21  6:12               ` Zhang, Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang, Lei @ 2018-05-18  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc

> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Zhang,
> Lei
> Sent: Saturday, May 12, 2018 10:48 AM
> To: 'Marc Zyngier'; linux-arm-kernel at lists.infradead.org
> Subject: RE: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
 
> By the way I will finish my patch in the next week.

I rewrote the mechanism of lpis's management by using free list.
typedef struct lpi_mng {
       struct list_head lpi_list;
       int base;
       int len;
} lpi_mng_t;

When its_chunk_to_lpi is called, the entries of free list as below.
(Assumption id_bits = 30)
 base=8192, len=8192
 base=16384, len=16384
 base=32768, len=32768
 base=65536, len=65536
 base=131072, len=131072
 base=262144, len=262144
 base=524288, len=524288
 base=1048576, len=1048576
 base=2097152, len=2097152
 base=4194304, len=4194304
 base=8388608, len=8388608
 base=16777216, len=16777216
 base=33554432, len=33554432
 base=67108864, len=67108864
 base=134217728, len=134217728
 base=268435456, len=268435456
 base=536870912, len=536870912

When its_lpi_alloc_chunks is called, I split the free list until free list' length equal requested number of lpis.
I guarantee base is aligned on the size of len, and len is always a power of two.

Below is my patch for core ITS driver. 
Would you give me comments?

By the way,I wanted to add a parameter request_lpis_align.
But I think it is not really necessary.
Because if the number of lpis requested is 32, it will be allocated contiguous 32 lpis 
and the first lpi number allocated aligned on 32 even without this parameter.


-----------------
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -672,82 +672,127 @@ static struct irq_chip its_irq_chip = {
        .irq_compose_msi_msg    = its_irq_compose_msi_msg,
 };

-/*
- * How we allocate LPIs:
- *
- * The GIC has id_bits bits for interrupt identifiers. From there, we
- * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then, as
- * we allocate LPIs by chunks of 32, we can shift the whole thing by 5
- * bits to the right.
- *
- * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
- */
-#define IRQS_PER_CHUNK_SHIFT 5
-#define IRQS_PER_CHUNK         (1 << IRQS_PER_CHUNK_SHIFT)
-
-static unsigned long *lpi_bitmap;
-static u32 lpi_chunks;
-static DEFINE_SPINLOCK(lpi_lock);

-static int its_lpi_to_chunk(int lpi)
-{
-       return (lpi - 8192) >> IRQS_PER_CHUNK_SHIFT;
-}
+static struct list_head lpi_free_list;
+static struct list_head lpi_alloc_list;
+typedef struct lpi_mng {
+       struct list_head lpi_list;
+       int base;
+       int len;
+} lpi_mng_t;

-static int its_chunk_to_lpi(int chunk)
-{
-       return (chunk << IRQS_PER_CHUNK_SHIFT) + 8192;
-}
+static DEFINE_SPINLOCK(lpi_lock);

 static int __init its_lpi_init(u32 id_bits)
 {
-       lpi_chunks = its_lpi_to_chunk(1UL << id_bits);

-       lpi_bitmap = kzalloc(BITS_TO_LONGS(lpi_chunks) * sizeof(long),
-                            GFP_KERNEL);
-       if (!lpi_bitmap) {
-               lpi_chunks = 0;
+
+       u32 nr_irq = 1UL << id_bits;
+       lpi_mng_t *lpi_free_mng = NULL;
+       lpi_mng_t *lpi_new = NULL;
+       INIT_LIST_HEAD(&lpi_free_list);
+       INIT_LIST_HEAD(&lpi_alloc_list);
+
+       lpi_free_mng = kzalloc(sizeof(lpi_mng_t), GFP_ATOMIC);
+       if(!lpi_free_mng) {
                return -ENOMEM;
        }
+       lpi_free_mng->base = 0;
+       lpi_free_mng->len = nr_irq;
+       list_add(&lpi_free_mng->lpi_list, &lpi_free_list);
+
+       do {
+               lpi_free_mng = list_first_entry(&lpi_free_list, lpi_mng_t, lpi_list);
+               if(lpi_free_mng->len == 8192) {
+                       /*It is not lpi, so we delete */
+                       if(lpi_free_mng->base == 0) {
+                               list_del_init(&lpi_free_mng->lpi_list);
+                               kfree(lpi_free_mng);
+                               continue;
+                       }
+                       if(lpi_free_mng->base == 8192) {
+                               goto out;
+                       }
+               }
+               if(lpi_free_mng->len > 8192) {
+                       lpi_new  = kzalloc(sizeof (lpi_mng_t),
+                                        GFP_ATOMIC);
+                       if(!lpi_new) {
+                               return -ENOMEM;
+                       }
+                       lpi_free_mng->len /= 2;
+                       lpi_new->base = lpi_free_mng->base + lpi_free_mng->len;
+                       lpi_new->len = lpi_free_mng->len;
+                       list_add(&lpi_new->lpi_list,&lpi_free_mng->lpi_list);
+               }
+       }while(1);

-       pr_info("ITS: Allocated %d chunks for LPIs\n", (int)lpi_chunks);
+out:
+       pr_info("ITS: Allocated %d  LPIs\n", nr_irq - 8192);
        return 0;
 }

+static lpi_mng_t* its_alloc_lpi(int nr_irqs)
+{
+       lpi_mng_t *lpi_alloc_mng = NULL;
+       lpi_mng_t *lpi_split = NULL;
+       lpi_mng_t *lpi_new = NULL;
+       int base;
+
+       base = 0x7fffffff;
+       do {
+               list_for_each_entry(lpi_alloc_mng, &lpi_free_list, lpi_list) {
+                       if(nr_irqs > lpi_alloc_mng->len) {
+                               continue;
+                       }
+                       if(nr_irqs == lpi_alloc_mng->len) {
+                               list_del_init(&lpi_alloc_mng->lpi_list);
+                               list_add(&lpi_alloc_mng->lpi_list, &lpi_alloc_list);
+                               return lpi_alloc_mng;
+                       }
+                       if((nr_irqs < lpi_alloc_mng->len) && (lpi_alloc_mng->base < base)) {
+                               base = lpi_alloc_mng->base;
+                               lpi_split = lpi_alloc_mng;
+                       }
+               }
+               lpi_new  = kzalloc(sizeof (lpi_mng_t),
+                                GFP_ATOMIC);
+               if(!lpi_new || !lpi_split) {
+                       return NULL;
+               }
+
+               lpi_split->len /= 2;
+               lpi_new->base = lpi_split->base + lpi_split->len;
+               lpi_new->len = lpi_split->len;
+               list_add(&lpi_new->lpi_list,&lpi_split->lpi_list);
+
+       } while(1);
+}
+
 static unsigned long *its_lpi_alloc_chunks(int nr_irqs, int *base, int *nr_ids)
 {
        unsigned long *bitmap = NULL;
-       int chunk_id;
-       int nr_chunks;
-       int i;
-
-       nr_chunks = DIV_ROUND_UP(nr_irqs, IRQS_PER_CHUNK);
+       lpi_mng_t *lpi_alloc_mng = NULL;

        spin_lock(&lpi_lock);

-       do {
-               chunk_id = bitmap_find_next_zero_area(lpi_bitmap, lpi_chunks,
-                                                     0, nr_chunks, 0);
-               if (chunk_id < lpi_chunks)
-                       break;
-
-               nr_chunks--;
-       } while (nr_chunks > 0);
+       lpi_alloc_mng = its_alloc_lpi(nr_irqs);

-       if (!nr_chunks)
+       if (!lpi_alloc_mng)
                goto out;

-       bitmap = kzalloc(BITS_TO_LONGS(nr_chunks * IRQS_PER_CHUNK) * sizeof (long),
+       bitmap = kzalloc(BITS_TO_LONGS(nr_irqs) * sizeof (long),
                         GFP_ATOMIC);
        if (!bitmap)
                goto out;

-       for (i = 0; i < nr_chunks; i++)
-               set_bit(chunk_id + i, lpi_bitmap);

-       *base = its_chunk_to_lpi(chunk_id);
-       *nr_ids = nr_chunks * IRQS_PER_CHUNK;
+       *base = lpi_alloc_mng->base;
+       *nr_ids = lpi_alloc_mng->len;

 out:
        spin_unlock(&lpi_lock);
@@ -758,21 +803,47 @@ out:
        return bitmap;
 }

+static void its_joint_free_list(lpi_mng_t *free, lpi_mng_t *alloc)
+{
+       free->len = free->len * 2;
+       if(free->base > alloc->base) {
+               free->base = alloc->base;
+       }
+}
+
 static void its_lpi_free(struct event_lpi_map *map)
 {
        int base = map->lpi_base;
-       int nr_ids = map->nr_lpis;
-       int lpi;
-
+       lpi_mng_t *lpi_alloc_mng = NULL;
+       lpi_mng_t *lpi_free_mng = NULL;
+       bool first_half;
+       int pair_base;
        spin_lock(&lpi_lock);

-       for (lpi = base; lpi < (base + nr_ids); lpi += IRQS_PER_CHUNK) {
-               int chunk = its_lpi_to_chunk(lpi);
-               BUG_ON(chunk > lpi_chunks);
-               if (test_bit(chunk, lpi_bitmap)) {
-                       clear_bit(chunk, lpi_bitmap);
-               } else {
-                       pr_err("Bad LPI chunk %d\n", chunk);
+       list_for_each_entry(lpi_alloc_mng, &lpi_alloc_list, lpi_list) {
+               if(lpi_alloc_mng->base == base) {
+                       list_del_init(&lpi_alloc_mng->lpi_list);
+                       break;
+               }
+
+       }
+
+       first_half = (lpi_alloc_mng->base % (lpi_alloc_mng->len * 2)) ? false : true;
+       if(first_half) {
+               pair_base = lpi_alloc_mng->base + lpi_alloc_mng->len;
+       }else {
+               pair_base = lpi_alloc_mng->base - lpi_alloc_mng->len;
+       }
+
+       list_for_each_entry(lpi_free_mng, &lpi_free_list, lpi_list) {
+               if(lpi_free_mng->base == pair_base) {
+                       its_joint_free_list(lpi_free_mng, lpi_alloc_mng);
+                       kfree(lpi_alloc_mng);
+                       break;
+               }
+               if(lpi_alloc_mng->base  < lpi_free_mng->base) {
+                       list_add_tail(&lpi_alloc_mng->lpi_list, &lpi_free_mng->lpi_list);
+                       break;
                }
        }



Best Regards,
Lei Zhang
--
Lei Zhang  e-mail: zhang.lei at jp.fujitsu.com FUJITSU LIMITED

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
  2018-05-18  9:49             ` Zhang, Lei
@ 2018-05-21  6:12               ` Zhang, Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Lei @ 2018-05-21  6:12 UTC (permalink / raw)
  To: linux-arm-kernel

My patch was based old kernel version, So I wrote a new patch based linux-4.17-rc6.
> -----Original Message-----
> From: linux-arm-kernel
> [mailto:linux-arm-kernel-bounces at lists.infradead.org] On Behalf Of Zhang,
> Lei
> Sent: Friday, May 18, 2018 6:49 PM
> To: 'Marc Zyngier'; linux-arm-kernel at lists.infradead.org
> Subject: RE: [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
> 
> I rewrote the mechanism of lpis's management by using free list.
> 
> Below is my patch for core ITS driver.
> Would you give me comments?

-------------------------------- 
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5416f2b..a42df4a 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1405,82 +1405,122 @@ static int its_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
        .irq_set_vcpu_affinity  = its_irq_set_vcpu_affinity,
 };

-/*
- * How we allocate LPIs:
- *
- * The GIC has id_bits bits for interrupt identifiers. From there, we
- * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then, as
- * we allocate LPIs by chunks of 32, we can shift the whole thing by 5
- * bits to the right.
- *
- * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
- */
-#define IRQS_PER_CHUNK_SHIFT   5
-#define IRQS_PER_CHUNK         (1UL << IRQS_PER_CHUNK_SHIFT)
-#define ITS_MAX_LPI_NRBITS     16 /* 64K LPIs */
+static struct list_head lpi_free_list;
+static struct list_head lpi_alloc_list;
+struct lpi_mng {
+       struct list_head lpi_list;
+       int base;
+       int len;
+};

-static unsigned long *lpi_bitmap;
-static u32 lpi_chunks;
+#define ITS_MAX_LPI_NRBITS     16 /* 64K LPIs */
 static DEFINE_SPINLOCK(lpi_lock);

-static int its_lpi_to_chunk(int lpi)
-{
-       return (lpi - 8192) >> IRQS_PER_CHUNK_SHIFT;
-}
-
-static int its_chunk_to_lpi(int chunk)
-{
-       return (chunk << IRQS_PER_CHUNK_SHIFT) + 8192;
-}

 static int __init its_lpi_init(u32 id_bits)
 {
-       lpi_chunks = its_lpi_to_chunk(1UL << id_bits);
+       u32 nr_irq = 1UL << id_bits;
+       struct lpi_mng *lpi_free_mng = NULL;
+       struct lpi_mng *lpi_new = NULL;
+
+       INIT_LIST_HEAD(&lpi_free_list);
+       INIT_LIST_HEAD(&lpi_alloc_list);

-       lpi_bitmap = kzalloc(BITS_TO_LONGS(lpi_chunks) * sizeof(long),
-                            GFP_KERNEL);
-       if (!lpi_bitmap) {
-               lpi_chunks = 0;
+       lpi_free_mng = kzalloc(sizeof(struct lpi_mng), GFP_KERNEL);
+       if (!lpi_free_mng)
                return -ENOMEM;
-       }

-       pr_info("ITS: Allocated %d chunks for LPIs\n", (int)lpi_chunks);
+       lpi_free_mng->base = 0;
+       lpi_free_mng->len = nr_irq;
+       list_add(&lpi_free_mng->lpi_list, &lpi_free_list);
+
+       do {
+               lpi_free_mng = list_first_entry(&lpi_free_list, struct lpi_mng,
+                       lpi_list);
+               if (lpi_free_mng->len == 8192) {
+                       /*It is not lpi, so we delete */
+                       if (lpi_free_mng->base == 0) {
+                               list_del_init(&lpi_free_mng->lpi_list);
+                               kfree(lpi_free_mng);
+                               continue;
+                       }
+                       if (lpi_free_mng->base == 8192)
+                               goto out;
+               }
+               if (lpi_free_mng->len > 8192) {
+                       lpi_new  = kzalloc(sizeof(struct lpi_mng),
+                                        GFP_ATOMIC);
+                       if (!lpi_new)
+                               return -ENOMEM;
+                       lpi_free_mng->len /= 2;
+                       lpi_new->base = lpi_free_mng->base + lpi_free_mng->len;
+                       lpi_new->len = lpi_free_mng->len;
+                       list_add(&lpi_new->lpi_list, &lpi_free_mng->lpi_list);
+               }
+       } while (1);
+
+out:
+       pr_info("ITS: Allocated %d  LPIs\n", nr_irq - 8192);
        return 0;
 }

+static struct lpi_mng *its_alloc_lpi(int nr_irqs)
+{
+       struct lpi_mng *lpi_alloc_mng = NULL;
+       struct lpi_mng *lpi_split = NULL;
+       struct lpi_mng *lpi_new = NULL;
+       int base;
+
+       base = 0x7fffffff;
+       do {
+               list_for_each_entry(lpi_alloc_mng, &lpi_free_list, lpi_list) {
+                       if (nr_irqs > lpi_alloc_mng->len)
+                               continue;
+                       if (nr_irqs == lpi_alloc_mng->len) {
+                               list_del_init(&lpi_alloc_mng->lpi_list);
+                               list_add(&lpi_alloc_mng->lpi_list,
+                                       &lpi_alloc_list);
+                               return lpi_alloc_mng;
+                       }
+                       if ((nr_irqs < lpi_alloc_mng->len)
+                               && (lpi_alloc_mng->base < base)) {
+                               base = lpi_alloc_mng->base;
+                               lpi_split = lpi_alloc_mng;
+                       }
+               }
+               lpi_new  = kzalloc(sizeof(struct lpi_mng),
+                                GFP_ATOMIC);
+               if (!lpi_new || !lpi_split)
+                       return NULL;
+
+               lpi_split->len /= 2;
+               lpi_new->base = lpi_split->base + lpi_split->len;
+               lpi_new->len = lpi_split->len;
+               list_add(&lpi_new->lpi_list, &lpi_split->lpi_list);
+
+       } while (1);
+}
+
 static unsigned long *its_lpi_alloc_chunks(int nr_irqs, int *base, int *nr_ids)
 {
        unsigned long *bitmap = NULL;
-       int chunk_id;
-       int nr_chunks;
-       int i;
-
-       nr_chunks = DIV_ROUND_UP(nr_irqs, IRQS_PER_CHUNK);
+       struct lpi_mng *lpi_alloc_mng = NULL;

        spin_lock(&lpi_lock);

-       do {
-               chunk_id = bitmap_find_next_zero_area(lpi_bitmap, lpi_chunks,
-                                                     0, nr_chunks, 0);
-               if (chunk_id < lpi_chunks)
-                       break;
-
-               nr_chunks--;
-       } while (nr_chunks > 0);
+       lpi_alloc_mng = its_alloc_lpi(nr_irqs);

-       if (!nr_chunks)
+       if (!lpi_alloc_mng)
                goto out;

-       bitmap = kzalloc(BITS_TO_LONGS(nr_chunks * IRQS_PER_CHUNK) * sizeof (long),
+       bitmap = kzalloc(BITS_TO_LONGS(nr_irqs) * sizeof(long),
                         GFP_ATOMIC);
        if (!bitmap)
                goto out;

-       for (i = 0; i < nr_chunks; i++)
-               set_bit(chunk_id + i, lpi_bitmap);

-       *base = its_chunk_to_lpi(chunk_id);
-       *nr_ids = nr_chunks * IRQS_PER_CHUNK;
+       *base = lpi_alloc_mng->base;
+       *nr_ids = lpi_alloc_mng->len;

 out:
        spin_unlock(&lpi_lock);
@@ -1491,23 +1531,53 @@ static unsigned long *its_lpi_alloc_chunks(int nr_irqs, int *base, int *nr_ids)
        return bitmap;
 }

+static void its_joint_free_list(struct lpi_mng *free, struct lpi_mng *alloc)
+{
+       free->len = free->len * 2;
+       if (free->base > alloc->base)
+               free->base = alloc->base;
+}
+
 static void its_lpi_free_chunks(unsigned long *bitmap, int base, int nr_ids)
 {
-       int lpi;
+       struct lpi_mng *lpi_alloc_mng = NULL;
+       struct lpi_mng *lpi_free_mng = NULL;
+       bool first_half;
+       int pair_base;

        spin_lock(&lpi_lock);

-       for (lpi = base; lpi < (base + nr_ids); lpi += IRQS_PER_CHUNK) {
-               int chunk = its_lpi_to_chunk(lpi);
-
-               BUG_ON(chunk > lpi_chunks);
-               if (test_bit(chunk, lpi_bitmap)) {
-                       clear_bit(chunk, lpi_bitmap);
-               } else {
-                       pr_err("Bad LPI chunk %d\n", chunk);
+       list_for_each_entry(lpi_alloc_mng, &lpi_alloc_list, lpi_list) {
+               if (lpi_alloc_mng->base == base) {
+                       list_del_init(&lpi_alloc_mng->lpi_list);
+                       break;
                }
        }

+       first_half = (lpi_alloc_mng->base % (lpi_alloc_mng->len * 2))
+                        ? false : true;
+       if (first_half)
+               pair_base = lpi_alloc_mng->base + lpi_alloc_mng->len;
+       else
+               pair_base = lpi_alloc_mng->base - lpi_alloc_mng->len;
+
+       // found the other half
+       list_for_each_entry(lpi_free_mng, &lpi_free_list, lpi_list) {
+               if (lpi_free_mng->base == pair_base) {
+                       its_joint_free_list(lpi_free_mng, lpi_alloc_mng);
+                       kfree(lpi_alloc_mng);
+                       goto out;
+               }
+       }
+       // Not found the other half
+       list_for_each_entry(lpi_free_mng, &lpi_free_list, lpi_list) {
+               if (lpi_alloc_mng->base  < lpi_free_mng->base) {
+                       list_add_tail(&lpi_alloc_mng->lpi_list,
+                               &lpi_free_mng->lpi_list);
+                       break;
+               }
+       }
+out:
        spin_unlock(&lpi_lock);

        kfree(bitmap);
@@ -2117,7 +2187,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
         * We allocate at least one chunk worth of LPIs bet device,
         * and thus that many ITEs. The device may require less though.
         */
-       nr_ites = max(IRQS_PER_CHUNK, roundup_pow_of_two(nvecs));
+       nr_ites = max(2UL, roundup_pow_of_two(nvecs));
        sz = nr_ites * its->ite_size;
        sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
        itt = kzalloc(sz, GFP_KERNEL);
--------------------------------
Best Regards,
Lei Zhang
--
Lei Zhang  e-mail: zhang.lei at jp.fujitsu.com FUJITSU LIMITED

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource
@ 2018-04-27 14:00 Zhang, Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Lei @ 2018-04-27 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

I want to talk about the implementation of GIC-ITS driver.
The current implementation of irq-gic-v3-its driver allocates at least 32 LPIs (interrupt numbers) for each Device ID even if the number of requested LPIs is only one.
I think it is a waste for LPI resource.
And if we use many devices over ITS, this implementation may cause a shortage of LPI .

I have a patch to avoid this problem by reducing the number of LPIs from 32 to 2 per chunk.
The points of this patch are as follows.
Point1:Each Device ID can still allocate enough LPIs by increasing chunk number.
Point2:The size of memory for the bitmap (lpi_bitmap) to manage the free chunks becomes larger(256B -> 4KiB).But I think it is not a problem, because the max size of lpi_bitmap is 4KiB.

Signed-off-by: Lei Zhang <zhang.lei@jp.fujitsu.com>

The follow is my patch. the patch is based on kernel 4.16.5
-------------------------------------
--- drivers/irqchip/irq-gic-v3-its.c       2018-01-29 06:20:33.000000000 +0900
+++ drivers/irqchip/irq-gic-v3-its.c    2018-04-25 15:05:26.078956980 +0900
@@ -1406,12 +1406,12 @@
  *
  * The GIC has id_bits bits for interrupt identifiers. From there, we
  * must subtract 8192 which are reserved for SGIs/PPIs/SPIs. Then, as
- * we allocate LPIs by chunks of 32, we can shift the whole thing by 5
+ * we allocate LPIs by chunks of 2, we can shift the whole thing by 1
  * bits to the right.
  *
  * This gives us (((1UL << id_bits) - 8192) >> 5) possible allocations.
  */
-#define IRQS_PER_CHUNK_SHIFT   5
+#define IRQS_PER_CHUNK_SHIFT   1
 #define IRQS_PER_CHUNK         (1 << IRQS_PER_CHUNK_SHIFT)
 #define ITS_MAX_LPI_NRBITS     16 /* 64K LPIs */

Regards,
Lei Zhang
--
e-mail: zhang.lei@jp.fujitsu.com 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-21  6:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8898674D84E3B24BA3A2D289B872026A69EECF90@G01JPEXMBKW03>
2018-04-27 14:12 ` [PATCH]irqchip/irq-gic-v3:Avoid a waste of LPI resource Marc Zyngier
2018-04-30  7:53   ` Zhang, Lei
2018-04-30 10:20     ` Marc Zyngier
2018-05-03  6:46       ` Zhang, Lei
2018-05-09 14:32     ` Mark Langsdorf
2018-05-10 13:09       ` Zhang, Lei
2018-05-10 14:12         ` Marc Zyngier
2018-05-12  1:47           ` Zhang, Lei
2018-05-18  9:49             ` Zhang, Lei
2018-05-21  6:12               ` Zhang, Lei
2018-04-27 14:00 Zhang, Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.