linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
@ 2016-08-07 14:10 Tom Yan
  2016-08-10  3:26 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Yan @ 2016-08-07 14:10 UTC (permalink / raw)
  To: linux-ide, Tejun Heo, dmilburn; +Cc: linux-scsi, linux-block, linux-kernel

So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
(commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.

However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
been set to a weird value of 168 (with a comment "hardware max is
64K", which neither seem to make any sense).

AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
apparently), which is apparently used to derive the actual "request
size" (that is, if it is lower than max_sectors(_kb), it will be the
limiting factor instead).

For example, no matter if the drive has max_sectors set to 2560, or to
65535 (by adding it as the Optimal Transfer Length to libata's SATL,
which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
"avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

However, if I change AHCI_MAX_SG to 128 (which is also the
sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
`iostat` will be capped at 1024 (128 * 8), which should make
ATA_HORKAGE_MAX_SEC_1024 unnecessary.

So why has AHCI_MAX_SG been set to 168 anyway?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-07 14:10 Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024) Tom Yan
@ 2016-08-10  3:26 ` Tejun Heo
  2016-08-10 10:04   ` Tom Yan
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2016-08-10  3:26 UTC (permalink / raw)
  To: Tom Yan; +Cc: linux-ide, dmilburn, linux-scsi, linux-block, linux-kernel

Hello, Tom.

On Sun, Aug 07, 2016 at 10:10:17PM +0800, Tom Yan wrote:
> So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560
> (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA
> devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024.
> 
> However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not
> the "real" cause of the trouble, but the fact that AHCI_MAX_SG has
> been set to a weird value of 168 (with a comment "hardware max is
> 64K", which neither seem to make any sense).

Hmmm.. why not?  The hardware limit is 64k and the driver is using a
lower limit of 168 most likely because it doesn't make noticeable
difference beyond certain point and it determines the size of
contiguous memory which has to be allocated for the command table.
Each sg entry is 16 bytes.  Pushing it to the hardware limit would
require an order 9 allocation for each port.

> AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments,
> apparently), which is apparently used to derive the actual "request
> size" (that is, if it is lower than max_sectors(_kb), it will be the
> limiting factor instead).
>
> For example, no matter if the drive has max_sectors set to 2560, or to
> 65535 (by adding it as the Optimal Transfer Length to libata's SATL,
> which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48),
> "avgrq-sz" in `iostat` will be capped at 1344 (168 * 8).

Not necessarily.  A single sg entry can point to an area larger than
PAGE_SIZE.

> However, if I change AHCI_MAX_SG to 128 (which is also the
> sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in
> `iostat` will be capped at 1024 (128 * 8), which should make
> ATA_HORKAGE_MAX_SEC_1024 unnecessary.
> 
> So why has AHCI_MAX_SG been set to 168 anyway?

As written above, that probably makes the ahci command table size
nicely aligned.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-10  3:26 ` Tejun Heo
@ 2016-08-10 10:04   ` Tom Yan
  2016-08-10 15:14     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Yan @ 2016-08-10 10:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide, dmilburn, linux-scsi, linux-block, linux-kernel

On 10 August 2016 at 11:26, Tejun Heo <tj@kernel.org> wrote:
> Hmmm.. why not?  The hardware limit is 64k and the driver is using a

Is that referring to the maximum number of entries allowed in the
PRDT, Physical Region Descriptor Table (which is, more precisely,
65535)?

> lower limit of 168 most likely because it doesn't make noticeable
> difference beyond certain point and it determines the size of
> contiguous memory which has to be allocated for the command table.
> Each sg entry is 16 bytes.  Pushing it to the hardware limit would
> require an order 9 allocation for each port.

That makes sense to me, and I didn't have the intention to push it to
the limit anyway.

> Not necessarily.  A single sg entry can point to an area larger than
> PAGE_SIZE.

You mean the 4MB limit of "Data Byte Count" in "DW3: Description
Information" of the PRDT? Is that what max_segment_size (which is set
to a general fallback of 65536:
http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
in this case?

And my point was, it will be a multiple of 168 anyway, if 1344 is just
an example.

> As written above, that probably makes the ahci command table size
> nicely aligned.

I think that's what bothers me ultimately, cause I don't see how 168
makes it (more) nicely aligned (or even, aligned to what?).

I even checked out the AHCI driver of FreeBSD
(https://github.com/freebsd/freebsd/blob/master/sys/dev/ahci/ahci.h):

...
#define MAXPHYS 512 * 1024
...
#define AHCI_SG_ENTRIES (roundup(btoc(MAXPHYS) + 1, 8))
...
#define AHCI_CT_SIZE (128 + AHCI_SG_ENTRIES * 16)
...

Couldn't get the sense out of the `+ 1` and round up to 8 thing either.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-10 10:04   ` Tom Yan
@ 2016-08-10 15:14     ` Tejun Heo
  2016-08-10 15:41       ` David Milburn
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2016-08-10 15:14 UTC (permalink / raw)
  To: Tom Yan; +Cc: linux-ide, dmilburn, linux-scsi, linux-block, linux-kernel

Hello, Tom.

On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:
> On 10 August 2016 at 11:26, Tejun Heo <tj@kernel.org> wrote:
> > Hmmm.. why not?  The hardware limit is 64k and the driver is using a
> 
> Is that referring to the maximum number of entries allowed in the
> PRDT, Physical Region Descriptor Table (which is, more precisely,
> 65535)?

Yeap.

> > Not necessarily.  A single sg entry can point to an area larger than
> > PAGE_SIZE.
> 
> You mean the 4MB limit of "Data Byte Count" in "DW3: Description
> Information" of the PRDT? Is that what max_segment_size (which is set
> to a general fallback of 65536:
> http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
> in this case?

Ah, ahci isn't setting the hardware limit properly but yeah that's the
maximum segment size.

> And my point was, it will be a multiple of 168 anyway, if 1344 is just
> an example.
> 
> > As written above, that probably makes the ahci command table size
> > nicely aligned.
> 
> I think that's what bothers me ultimately, cause I don't see how 168
> makes it (more) nicely aligned (or even, aligned to what?).

Hmmm... Looked at the sizes and they don't seem to align to anything
meaningful.  No idea.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-10 15:14     ` Tejun Heo
@ 2016-08-10 15:41       ` David Milburn
  2016-08-10 17:19         ` Tom Yan
  0 siblings, 1 reply; 7+ messages in thread
From: David Milburn @ 2016-08-10 15:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Tom Yan, linux-ide, linux-scsi, linux-block, linux-kernel

Hi,

On 08/10/2016 10:14 AM, Tejun Heo wrote:
> Hello, Tom.
>
> On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote:
>> On 10 August 2016 at 11:26, Tejun Heo <tj@kernel.org> wrote:
>>> Hmmm.. why not?  The hardware limit is 64k and the driver is using a
>>
>> Is that referring to the maximum number of entries allowed in the
>> PRDT, Physical Region Descriptor Table (which is, more precisely,
>> 65535)?
>
> Yeap.
>
>>> Not necessarily.  A single sg entry can point to an area larger than
>>> PAGE_SIZE.
>>
>> You mean the 4MB limit of "Data Byte Count" in "DW3: Description
>> Information" of the PRDT? Is that what max_segment_size (which is set
>> to a general fallback of 65536:
>> http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about
>> in this case?
>
> Ah, ahci isn't setting the hardware limit properly but yeah that's the
> maximum segment size.
>
>> And my point was, it will be a multiple of 168 anyway, if 1344 is just
>> an example.
>>
>>> As written above, that probably makes the ahci command table size
>>> nicely aligned.
>>
>> I think that's what bothers me ultimately, cause I don't see how 168
>> makes it (more) nicely aligned (or even, aligned to what?).
>
> Hmmm... Looked at the sizes and they don't seem to align to anything
> meaningful.  No idea.

The 168 makes AHCI_CMD_TBL_SZ equal to 2816

AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
AHCI_CMD_TBL_SZ = 128 + (168 * 16)

I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
the DMA is 4K aligned, I think that is where the 168 came from.

Thanks,
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-10 15:41       ` David Milburn
@ 2016-08-10 17:19         ` Tom Yan
  2016-08-10 19:38           ` David Milburn
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Yan @ 2016-08-10 17:19 UTC (permalink / raw)
  To: David Milburn; +Cc: Tejun Heo, linux-ide, linux-scsi, linux-block, LKML

On 10 August 2016 at 15:41, David Milburn <dmilburn@redhat.com> wrote:
> Hi,
>
> The 168 makes AHCI_CMD_TBL_SZ equal to 2816
>
> AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
> AHCI_CMD_TBL_SZ = 128 + (168 * 16)
>
> I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
> the DMA is 4K aligned, I think that is where the 168 came from.

Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096

but:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ (256) = 91392

and AHCI_PORT_PRIV_FBS_DMA_SZ is:

AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
AHCI_RX_FIS_SZ * 16 (4096) = 95232

>
> Thanks,
> David
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
  2016-08-10 17:19         ` Tom Yan
@ 2016-08-10 19:38           ` David Milburn
  0 siblings, 0 replies; 7+ messages in thread
From: David Milburn @ 2016-08-10 19:38 UTC (permalink / raw)
  To: Tom Yan; +Cc: Tejun Heo, linux-ide, linux-scsi, linux-block, LKML

On 08/10/2016 12:19 PM, Tom Yan wrote:
> On 10 August 2016 at 15:41, David Milburn <dmilburn@redhat.com> wrote:
>> Hi,
>>
>> The 168 makes AHCI_CMD_TBL_SZ equal to 2816
>>
>> AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16)
>> AHCI_CMD_TBL_SZ = 128 + (168 * 16)
>>
>> I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256)
>> the DMA is 4K aligned, I think that is where the 168 came from.
>
> Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not:
>
> AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096
>
> but:
>
> AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
> AHCI_RX_FIS_SZ (256) = 91392
>
> and AHCI_PORT_PRIV_FBS_DMA_SZ is:
>
> AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) +
> AHCI_RX_FIS_SZ * 16 (4096) = 95232
>

Yes, but in both cases mem_dma gets adjusted for AHCI_CMD_SLOT_SZ (1024)
and rx_fis_sz (256 or 4096 in fbs case).

Thanks,
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-10 20:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-07 14:10 Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024) Tom Yan
2016-08-10  3:26 ` Tejun Heo
2016-08-10 10:04   ` Tom Yan
2016-08-10 15:14     ` Tejun Heo
2016-08-10 15:41       ` David Milburn
2016-08-10 17:19         ` Tom Yan
2016-08-10 19:38           ` David Milburn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).