All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Mel Gorman <mgorman@suse.de>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Jakub Kicinski <kuba@kernel.org>,
	brouer@redhat.com
Subject: Re: alloc_pages_bulk()
Date: Thu, 11 Feb 2021 13:26:28 +0100	[thread overview]
Message-ID: <20210211132628.1fe4f10b@carbon> (raw)
In-Reply-To: <20210211091235.GC3697@techsingularity.net>

On Thu, 11 Feb 2021 09:12:35 +0000
Mel Gorman <mgorman@techsingularity.net> wrote:

> On Wed, Feb 10, 2021 at 10:58:37PM +0000, Chuck Lever wrote:
> > > Not in the short term due to bug load and other obligations.
> > > 
> > > The original series had "mm, page_allocator: Only use per-cpu allocator
> > > for irq-safe requests" but that was ultimately rejected because softirqs
> > > were affected so it would have to be done without that patch.
> > > 
> > > The last patch can be rebased easily enough but it only batch allocates
> > > order-0 pages. It's also only build tested and could be completely
> > > miserable in practice and as I didn't even try boot test let, let alone
> > > actually test it, it could be a giant pile of crap. To make high orders
> > > work, it would need significant reworking but if the API showed even
> > > partial benefit, it might motiviate someone to reimplement the bulk
> > > interfaces to perform better.
> > > 
> > > Rebased diff, build tested only, might not even work  
> > 
> > Thanks, Mel, for kicking off a forward port.
> > 
> > It compiles. I've added a patch to replace the page allocation loop
> > in svc_alloc_arg() with a call to alloc_pages_bulk().
> > 
> > The server system deadlocks pretty quickly with any NFS traffic. Based
> > on some initial debugging, it appears that a pcplist is getting corrupted
> > and this causes the list_del() in __rmqueue_pcplist() to fail during a
> > a call to alloc_pages_bulk().
> >   
> 
> Parameters to __rmqueue_pcplist are garbage as the parameter order changed.
> I'm surprised it didn't blow up in a spectacular fashion. Again, this
> hasn't been near any testing and passing a list with high orders to
> free_pages_bulk() will corrupt lists too. Mostly it's a curiousity to see
> if there is justification for reworking the allocator to fundamentally
> deal in batches and then feed batches to pcp lists and the bulk allocator
> while leaving the normal GFP API as single page "batches". While that
> would be ideal, it's relatively high risk for regressions. There is still
> some scope for adding a basic bulk allocator before considering a major
> refactoring effort.

The alloc_flags reminds me that I have some asks around the semantics
of the API.  I'm concerned about the latency impact on preemption.  I
want us to avoid creating something that runs for too long with
IRQs/preempt disabled.

(With SLUB kmem_cache_free_bulk() we manage to run most of the time with
preempt and IRQs enabled.  So, I'm not worried about large slab bulk
free. For SLUB kmem_cache_alloc_bulk() we run with local_irq_disable(),
so I always recommend users not to do excessive bulk-alloc.)

For this page bulk alloc API, I'm fine with limiting it to only support
order-0 pages. (This will also fit nicely with the PCP system it think).

I also suggest the API can return less pages than requested. Because I
want to to "exit"/return if it need to go into an expensive code path
(like buddy allocator or compaction).  I'm assuming we have a flags to
give us this behavior (via gfp_flags or alloc_flags)?

My use-case is in page_pool where I can easily handle not getting exact
number of pages, and I want to handle low-latency network traffic.



> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f8353ea7b977..8f3fe7de2cf7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5892,7 +5892,7 @@ __alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
>  	pcp_list = &pcp->lists[migratetype];
>  
>  	while (nr_pages) {
> -		page = __rmqueue_pcplist(zone, gfp_mask, migratetype,
> +		page = __rmqueue_pcplist(zone, migratetype, alloc_flags,
>  								pcp, pcp_list);
>  		if (!page)
>  			break;



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2021-02-11 12:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 15:42 alloc_pages_bulk() Chuck Lever
2021-02-08 17:50 ` Fwd: alloc_pages_bulk() Chuck Lever
2021-02-09 10:31   ` alloc_pages_bulk() Jesper Dangaard Brouer
2021-02-09 13:37     ` alloc_pages_bulk() Chuck Lever
2021-02-09 17:27     ` alloc_pages_bulk() Vlastimil Babka
2021-02-10  9:51       ` alloc_pages_bulk() Christoph Hellwig
2021-02-10  8:41     ` alloc_pages_bulk() Mel Gorman
2021-02-10 11:41       ` alloc_pages_bulk() Jesper Dangaard Brouer
2021-02-10 13:07         ` alloc_pages_bulk() Mel Gorman
2021-02-10 22:58           ` alloc_pages_bulk() Chuck Lever
2021-02-11  9:12             ` alloc_pages_bulk() Mel Gorman
2021-02-11 12:26               ` Jesper Dangaard Brouer [this message]
2021-02-15 12:00                 ` alloc_pages_bulk() Mel Gorman
2021-02-15 16:10                   ` alloc_pages_bulk() Jesper Dangaard Brouer
2021-02-22  9:42                     ` alloc_pages_bulk() Mel Gorman
2021-02-22 11:42                       ` alloc_pages_bulk() Jesper Dangaard Brouer
2021-02-22 14:08                         ` alloc_pages_bulk() Mel Gorman
2021-02-11 16:20               ` alloc_pages_bulk() Chuck Lever
2021-02-15 12:06                 ` alloc_pages_bulk() Mel Gorman
2021-02-15 16:00                   ` alloc_pages_bulk() Chuck Lever
2021-02-22 20:44                   ` alloc_pages_bulk() Jesper Dangaard Brouer
2021-02-09 22:01   ` Fwd: alloc_pages_bulk() Matthew Wilcox
2021-02-09 22:55     ` alloc_pages_bulk() Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210211132628.1fe4f10b@carbon \
    --to=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.