All of lore.kernel.org
 help / color / mirror / Atom feed
From: 김재원 <jaewon31.kim@samsung.com>
To: John Stultz <jstultz@google.com>
Cc: "T.J. Mercier" <tjmercier@google.com>,
	"sumit.semwal@linaro.org" <sumit.semwal@linaro.org>,
	"daniel.vetter@ffwll.ch" <daniel.vetter@ffwll.ch>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jaewon31.kim@gmail.com" <jaewon31.kim@gmail.com>
Subject: RE:(2) [PATCH] dma-buf: system_heap: avoid reclaim for order 4
Date: Thu, 26 Jan 2023 13:42:18 +0900	[thread overview]
Message-ID: <20230126044218epcms1p35474178c2f2b18524f35c7d9799e3aed@epcms1p3> (raw)
In-Reply-To: <CANDhNCoAKtHmxFomdGfTfXy8ZvFMfMRj4jZ+b8wMMD+5AmAB0g@mail.gmail.com>

> On Wed, Jan 25, 2023 at 2:20 AM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> > > > On Tue, Jan 17, 2023 at 10:54 PM John Stultz <jstultz@google.com> wrote:
> > > > >
> > > > > On Tue, Jan 17, 2023 at 12:31 AM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> > > > > > > Using order 4 pages would be helpful for many IOMMUs, but it could spend
> > > > > > > quite much time in page allocation perspective.
> > > > > > >
> > > > > > > The order 4 allocation with __GFP_RECLAIM may spend much time in
> > > > > > > reclaim and compation logic. __GFP_NORETRY also may affect. These cause
> > > > > > > unpredictable delay.
> > > > > > >
> > > > > > > To get reasonable allocation speed from dma-buf system heap, use
> > > > > > > HIGH_ORDER_GFP for order 4 to avoid reclaim.
> > > > >
> > > > > Thanks for sharing this!
> > > > > The case where the allocation gets stuck behind reclaim under pressure
> > > > > does sound undesirable, but I'd be a bit hesitant to tweak numbers
> > > > > that have been used for a long while (going back to ion) without a bit
> > > > > more data.
> > > > >
> > > > > It might be good to also better understand the tradeoff of potential
> > > > > on-going impact to performance from using low order pages when the
> > > > > buffer is used.  Do you have any details like or tests that you could
> > > > > share to help ensure this won't impact other users?
> > > > >
> > > > > TJ: Do you have any additional thoughts on this?
> > > > >
> > > > I don't have any data on how often we hit reclaim for mid order
> > > > allocations. That would be interesting to know. However the 70th
> > > > percentile of system-wide buffer sizes while running the camera on my
> > > > phone is still only 1 page, so it looks like this change would affect
> > > > a subset of use-cases.
> > > >
> > > > Wouldn't this change make it less likely to get an order 4 allocation
> > > > (under memory pressure)? The commit message makes me think the goal of
> > > > the change is to get more of them.
> > >
> > > Hello John Stultz
> > >
> > > I've been waiting for your next reply.
> 
> Sorry, I was thinking you were gathering data on the tradeoffs. Sorry
> for my confusion.
> 
> > > With my commit, we may gather less number of order 4 pages and fill the
> > > requested size with more number of order 0 pages. I think, howerver, stable
> > > allocation speed is quite important so that corresponding user space
> > > context can move on within a specific time.
> > >
> > > Not only compaction but reclaim also, I think, would be invoked more if the
> > > __GFP_RECLAIM is added on order 4. I expect the reclaim could be decreased
> > > if we move to order 0.
> > >
> >
> > Additionally I'd like to say the old legacy ion system heap also used the
> > __GFP_RECLAIM only for order 8, not for order 4.
> >
> > drivers/staging/android/ion/ion_system_heap.c
> >
> > static gfp_t high_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN |
> >                                     __GFP_NORETRY) & ~__GFP_RECLAIM;
> > static gfp_t low_order_gfp_flags  = GFP_HIGHUSER | __GFP_ZERO;
> > static const unsigned int orders[] = {8, 4, 0};
> >
> > static int ion_system_heap_create_pools(struct ion_page_pool **pools)
> > {
> >        int i;
> >
> >        for (i = 0; i < NUM_ORDERS; i++) {
> >                struct ion_page_pool *pool;
> >                gfp_t gfp_flags = low_order_gfp_flags;
> >
> >                if (orders[i] > 4)
> >                        gfp_flags = high_order_gfp_flags;
> 
> 
> This seems a bit backwards from your statement. It's only removing
> __GFP_RECLAIM on order 8 (high_order_gfp_flags).

Oh sorry, my fault. I also read wrongly. But as far as I know, most of
AP chipset vendors have been using __GFP_RECLAIM only for order 0.
I can't say in detail though.

> 
> So apologies again, but how is that different from the existing code?
> 
> #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
> #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
> #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
>                                 | __GFP_NORETRY) & ~__GFP_RECLAIM) \
>                                 | __GFP_COMP)
> static gfp_t order_flags[] = {HIGH_ORDER_GFP, MID_ORDER_GFP, LOW_ORDER_GFP};
> 
> Where the main reason we introduced the mid-order flags is to avoid
> the warnings on order 4 allocation failures when we'll fall back to
> order 0
> 
> The only substantial difference I see between the old ion code and
> what we have now is the GFP_COMP addition, which is a bit hazy in my
> memory. I unfortunately don't have a record of why it was added (don't
> have access to my old mail box), so I suspect it was something brought
> up in private review.  Dropping that from the low order flags probably
> makes sense as TJ pointed out, but this isn't what your patch is
> changing.
> 
> Your patch is changing that for mid-order allocations we'll use the
> high order flags, so we'll not retry and not reclaim, so there will be
> more failing and falling back to single page allocations.
> This makes sense to make allocation time faster and more deterministic
> (I like it!), but potentially has the tradeoff of losing the
> performance benefit of using mid order page sizes.
> 
> I suspect your change is a net win overall, as the cumulative effect
> of using larger pages probably won't benefit more than the large
> indeterministic allocation time, particularly under pressure.
> 
> But because your change is different from what the old ion code did, I
> want to be a little cautious. So it would be nice to see some
> evaluation of not just the benefits the patch provides you but also of
> what negative impact it might have.  And so far you haven't provided
> any details there.
> 
> A quick example might be for the use case where mid-order allocations
> are causing you trouble, you could see how the performance changes if
> you force all mid-order allocations to be single page allocations (so
> orders[] = {8, 0, 0};) and compare it with the current code when
> there's no memory pressure (right after reboot when pages haven't been
> fragmented) so the mid-order allocations will succeed.  That will let
> us know the potential downside if we have brief / transient pressure
> at allocation time that forces small pages.
> 
> Does that make sense?

Let me try this. It make take some days. But I guess it depends on memory
status as you said. If there were quite many order 4 pages, then 8 4 0
should be faster than 8 0 0.

I don't know this is a right approach. In my opinion, except the specific
cases like right after reboot, there are not many order 4 pages. And
in determinisitic allocation time perspective, I think avoiding too long
allocations is more important than making faster with already existing
free order 4 pages.

BR
Jaewon Kim

> 
> thanks
> -john

  parent reply	other threads:[~2023-01-26  4:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcas1p2.samsung.com>
2023-01-17  8:25 ` [PATCH] dma-buf: system_heap: avoid reclaim for order 4 Jaewon Kim
     [not found]   ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p6>
2023-01-17  8:31     ` Jaewon Kim
2023-01-18  6:54       ` John Stultz
2023-01-18 19:55         ` T.J. Mercier
     [not found]         ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p2>
2023-01-25  9:56           ` Jaewon Kim
2023-01-25 10:19           ` Jaewon Kim
2023-01-25 20:32             ` John Stultz
     [not found]             ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p3>
2023-01-26  4:42               ` 김재원 [this message]
2023-01-26  5:04                 ` (2) " John Stultz
     [not found]       ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p8>
2023-01-18  7:30         ` Jaewon Kim
2023-02-04 15:02         ` Jaewon Kim
2023-02-07  4:37           ` (2) " John Stultz
     [not found]           ` <CGME20230117082521epcas1p22a709521a9e6d2346d06ac220786560d@epcms1p1>
2023-02-07  7:33             ` Jaewon Kim
2023-02-07 16:56               ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230126044218epcms1p35474178c2f2b18524f35c7d9799e3aed@epcms1p3 \
    --to=jaewon31.kim@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=hannes@cmpxchg.org \
    --cc=jaewon31.kim@gmail.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tjmercier@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.