linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sunil Kovvuri <sunil.kovvuri@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>,
	open list <linux-kernel@vger.kernel.org>,
	"Chalamarla, Tirumalesh" <Tirumalesh.Chalamarla@cavium.com>,
	"open list:ARM/QUALCOMM SUPPORT" <linux-arm-msm@vger.kernel.org>,
	Imran Khan <kimran@codeaurora.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	sgoutham@caviumnetworks.com
Subject: Re: [PATCH] Revert "arm64: Increase the max granular size"
Date: Wed, 19 Apr 2017 18:41:46 +0530	[thread overview]
Message-ID: <CA+sq2Cc8tqpGcUKkLh22e7_TZqu5jmrV9XKddVnxGfDsoSBf3A@mail.gmail.com> (raw)
In-Reply-To: <20170419120114.GB3238@e104818-lin.cambridge.arm.com>

On Wed, Apr 19, 2017 at 5:31 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Tue, Apr 18, 2017 at 10:35:02PM +0530, Sunil Kovvuri wrote:
>> On Tue, Apr 18, 2017 at 8:18 PM, Catalin Marinas
>> <catalin.marinas@arm.com> wrote:
>> > On Mon, Apr 17, 2017 at 04:08:52PM +0530, Sunil Kovvuri wrote:
>> >> >>     >> Do you have an explanation on the performance variation when
>> >> >>     >> L1_CACHE_BYTES is changed? We'd need to understand how the network stack
>> >> >>     >> is affected by L1_CACHE_BYTES, in which context it uses it (is it for
>> >> >>     >> non-coherent DMA?).
>> >> >>     >
>> >> >>     > network stack use SKB_DATA_ALIGN to align.
>> >> >>     > ---
>> >> >>     > #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \
>> >> >>     > ~(SMP_CACHE_BYTES - 1))
>> >> >>     >
>> >> >>     > #define SMP_CACHE_BYTES L1_CACHE_BYTES
>> >> >>     > ---
>> >> >>     > I think this is the reason of performance regression.
>> >> >>     >
>> >> >>
>> >> >>     Yes this is the reason for performance regression. Due to increases L1 cache alignment the
>> >> >>     object is coming from next kmalloc slab and skb->truesize is changing from 2304 bytes to
>> >> >>     4352 bytes. This in turn increases sk_wmem_alloc which causes queuing of less send buffers.
>> >>
>> >> With what traffic did you check 'skb->truesize' ? Increase from
>> >> 2304 to 4352 bytes doesn't seem to be real. I checked with ICMP
>> >> pkts with maximum size possible with 1500byte MTU and I don't see
>> >> such a bump. If the bump is observed with Iperf sending TCP packets
>> >> then I suggest to check if TSO is playing a part over here.
>> >
>> > I haven't checked truesize but I added some printks to __alloc_skb() (on
>> > a Juno platform) and the size argument to this function is 1720 on many
>> > occasions. With sizeof(struct skb_shared_info) of 320, the actual data
>> > allocation is exactly 2048 when using 64 byte L1_CACHE_SIZE. With a
>> > 128 byte cache size, it goes slightly over 2K, hence the 4K slab
>> > allocation.
>>
>> Understood but still in my opinion this '4K slab allocation' cannot be
>> considered as an issue with cache line size, there are many network
>> drivers out there which do receive buffer or page recycling to
>> minimize (sometimes almost to zero) the cost of buffer allocation.
>
> The slab allocation shouldn't make much difference (unless you are
> running on a memory constrained system) but I don't understand how
> skb->truesize (which is almost half unused) affects the sk_wmem_alloc
> and its interaction with other bits in the network stack (e.g.
> tcp_limit_output_bytes).
>
> However, I do think it's worth investigating further to fully understand
> the issue.

Absolutely.

>
>> >The 1720 figure surprised me a bit as well since I was
>> > expecting something close to 1500.
>> >
>> > The thing that worries me is that skb->data may be used as a buffer to
>> > DMA into. If that's the case, skb_shared_info is wrongly aligned based
>> > on SMP_CACHE_BYTES only and can lead to corruption on a non-DMA-coherent
>> > platform. It should really be ARCH_DMA_MINALIGN.
>>
>> I didn't get this, if you see __alloc_skb()
>>
>> 229         size = SKB_DATA_ALIGN(size);
>> 230         size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>
>> both DMA buffer and skb_shared_info are aligned to a cacheline separately,
>> considering 128byte alignment guarantees 64byte alignment as well, how
>> will this lead to corruption ?
>
> It's the other way around: you align only to 64 byte while running on a
> platform with 128 byte cache lines and non-coherent DMA.

Okay, I mistook your statement. This is indeed a valid statement.

>> And if platform is non-DMA-coherent then again it's the driver which
>> should take of coherency by using appropriate map/unmap APIs and
>> should avoid any cacheline sharing btw DMA buffer and skb_shared_info.
>
> The problem is that the streaming DMA API can only work correctly on
> cacheline-aligned buffers (because of the cache invalidation it performs
> for DMA ops; even with clean&invalidate, the operation isn't always safe
> if a cacheline is shared between DMA and CPU buffers). In the skb case,
> we could have the data potentially sharing the last addresses of a DMA
> buffer with struct skb_shared_info.
>
> We may be able to get away with SKB_DATA_ALIGN not using
> ARCH_DMA_MINALIGN *if* skb_shared_info is *not* written before or during
> an inbound DMA transfer (though such tricks are arch specific).
>
>> > IIUC, the Cavium platform has coherent DMA, so it shouldn't be an issue
>> > if we go back to 64 byte cache lines.
>>
>> Yes, Cavium platform is DMA coherent and there is no issue with reverting back
>> to 64byte cachelines. But do we want to do this because some platform has a
>> performance issue and this is an easy way to solve it. IMHO there seems
>> to be many ways to solve performance degradation within the driver itself, and
>> if those doesn't work then probably it makes sense to revert this.
>
> My initial thought was to revert the change because it was causing a
> significant performance regression on certain SoC. But given that it
> took over a year for people to follow up, it doesn't seem too urgent, so
> we should rather try to understand the issue and potential side effects
> of moving back to a 64 byte cache line.

Yes.

Thanks,
Sunil.

>
> --
> Catalin

  reply	other threads:[~2017-04-19 13:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-16  9:32 [PATCH] Revert "arm64: Increase the max granular size" Ganesh Mahendran
2016-03-16 10:07 ` Will Deacon
2016-03-16 13:06   ` Timur Tabi
2016-03-16 14:03     ` Mark Rutland
2016-03-16 14:35       ` Will Deacon
2016-03-16 14:54         ` Mark Rutland
2016-03-16 14:18     ` Catalin Marinas
2016-03-16 15:26       ` Timur Tabi
2016-03-17 14:27         ` Catalin Marinas
2016-03-17 14:49           ` Timur Tabi
2016-03-17 15:37             ` Catalin Marinas
2016-03-17 16:03               ` Marc Zyngier
2016-03-17 18:07           ` Andrew Pinski
2016-03-17 18:34             ` Timur Tabi
2016-03-17 18:37             ` Catalin Marinas
2016-03-18 21:05 ` Chalamarla, Tirumalesh
2016-03-21  1:56   ` Ganesh Mahendran
2016-03-21 17:14   ` Catalin Marinas
2016-03-21 17:23     ` Will Deacon
2016-03-21 17:33       ` Catalin Marinas
2016-03-21 17:39         ` Chalamarla, Tirumalesh
     [not found]     ` <CAPub14-sFgx=oCHzJPb9h9b_V0rbn5UAMDNJ-yTkjhz38JPqMQ@mail.gmail.com>
     [not found]       ` <10fef112-37f1-0a1b-b5af-435acd032f01@codeaurora.org>
2017-04-06  7:22         ` Imran Khan
2017-04-06 15:58           ` Catalin Marinas
2017-04-07  2:06             ` Ganesh Mahendran
2017-04-07  8:59               ` Catalin Marinas
2017-04-12  5:13               ` Imran Khan
2017-04-12 14:00                 ` Chalamarla, Tirumalesh
2017-04-17  7:35                   ` Imran Khan
2017-04-17 10:38                     ` Sunil Kovvuri
2017-04-18 14:48                       ` Catalin Marinas
2017-04-18 17:05                         ` Sunil Kovvuri
2017-04-19 12:01                           ` Catalin Marinas
2017-04-19 13:11                             ` Sunil Kovvuri [this message]
2017-04-25  6:42                               ` Ding Tianhong
2017-04-18 18:21                     ` Chalamarla, Tirumalesh
2017-04-11  4:40             ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+sq2Cc8tqpGcUKkLh22e7_TZqu5jmrV9XKddVnxGfDsoSBf3A@mail.gmail.com \
    --to=sunil.kovvuri@gmail.com \
    --cc=Tirumalesh.Chalamarla@cavium.com \
    --cc=catalin.marinas@arm.com \
    --cc=kimran@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=opensource.ganesh@gmail.com \
    --cc=sgoutham@caviumnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).