linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] kernel multithreading with padata
@ 2020-02-12 22:47 Daniel Jordan
  2020-02-12 23:31 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Jordan @ 2020-02-12 22:47 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-mm, Dan Williams, Dave Hansen, Tim Chen, Mike Kravetz,
	Herbert Xu, Steffen Klassert, Tejun Heo, Peter Zijlstra,
	Alex Williamson, Daniel Jordan

padata has been undergoing some surgery over the last year[0] and now seems
ready for another enhancement: splitting up and multithreading CPU-intensive
kernel work.

Quoting from an earlier series[1], the problem I'm trying to solve is

  A single CPU can spend an excessive amount of time in the kernel operating
  on large amounts of data.  Often these situations arise during initialization-
  and destruction-related tasks, where the data involved scales with system
  size.  These long-running jobs can slow startup and shutdown of applications
  and the system itself while extra CPUs sit idle.

Here are the current consumers:

 - struct page init (boot, hotplug, pmem)
 - VFIO page pinning (kvm guest init)
 - fallocating a hugetlb file (database shared memory init)

On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s),
and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a
hugetlb file that occupy a significant fraction of memory.  This work results
in 7-20x speedups and is currently increasing the uptime of our production
kernels.

Future areas include munmap/exit, umount, and __ib_umem_release.  Some of these
need coarse locks broken up for multithreading (zone->lock, lru_lock).

Positive outcomes for the session would be...

 - Finding a strategy for capping the maximum number of threads in a job.

 - Agreeing on a way for the job's threads to respect resource controls.

   In the past few weeks I've been thinking about whether remote charging
   in the CPU controller is feasible (RFD to come), am also considering creating
   workqueue workers directly in cgroup-specific pools instead, and have
   proposed migrating workers in and out of cgroups before[2].  There's also
   memory policy and sched_setaffinity() to think about.

 - Checking the overall design of this thing with the mm community, given that
   current users are all mm-related.

 - Getting advice from others (hallway track) on why some pmem devices
   perform better than others under multithreading.

This work-in-progress branch shows what it looks like now.

    git://oss.oracle.com/git/linux-dmjordan.git padata-mt-wip-v0.2
    https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-wip-v0.2

[0] https://lore.kernel.org/linux-crypto/?q=s%3Apadata+d%3A20190212..20200212
[1] https://lore.kernel.org/lkml/20181105165558.11698-1-daniel.m.jordan@oracle.com/
[2] https://lore.kernel.org/lkml/20190605133650.28545-1-daniel.m.jordan@oracle.com/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LSF/MM/BPF TOPIC] kernel multithreading with padata
  2020-02-12 22:47 [LSF/MM/BPF TOPIC] kernel multithreading with padata Daniel Jordan
@ 2020-02-12 23:31 ` Jason Gunthorpe
  2020-02-13 16:13   ` Daniel Jordan
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2020-02-12 23:31 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: lsf-pc, linux-mm, Dan Williams, Dave Hansen, Tim Chen,
	Mike Kravetz, Herbert Xu, Steffen Klassert, Tejun Heo,
	Peter Zijlstra, Alex Williamson

On Wed, Feb 12, 2020 at 05:47:31PM -0500, Daniel Jordan wrote:
> padata has been undergoing some surgery over the last year[0] and now seems
> ready for another enhancement: splitting up and multithreading CPU-intensive
> kernel work.
> 
> Quoting from an earlier series[1], the problem I'm trying to solve is
> 
>   A single CPU can spend an excessive amount of time in the kernel operating
>   on large amounts of data.  Often these situations arise during initialization-
>   and destruction-related tasks, where the data involved scales with system
>   size.  These long-running jobs can slow startup and shutdown of applications
>   and the system itself while extra CPUs sit idle.
> 
> Here are the current consumers:
> 
>  - struct page init (boot, hotplug, pmem)
>  - VFIO page pinning (kvm guest init)
>  - fallocating a hugetlb file (database shared memory init)
> 
> On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s),
> and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a
> hugetlb file that occupy a significant fraction of memory.  This work results
> in 7-20x speedups and is currently increasing the uptime of our production
> kernels.
> 
> Future areas include munmap/exit, umount, and __ib_umem_release.  Some of these
> need coarse locks broken up for multithreading (zone->lock, lru_lock).

I'm aware of this ib_umem_release request, it would be interesting to
see, the main workload here is put_page and dma_unmap

Jason


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [LSF/MM/BPF TOPIC] kernel multithreading with padata
  2020-02-12 23:31 ` Jason Gunthorpe
@ 2020-02-13 16:13   ` Daniel Jordan
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Jordan @ 2020-02-13 16:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Jordan, lsf-pc, linux-mm, Dan Williams, Dave Hansen,
	Tim Chen, Mike Kravetz, Herbert Xu, Steffen Klassert, Tejun Heo,
	Peter Zijlstra, Alex Williamson

On Wed, Feb 12, 2020 at 07:31:00PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2020 at 05:47:31PM -0500, Daniel Jordan wrote:
> > padata has been undergoing some surgery over the last year[0] and now seems
> > ready for another enhancement: splitting up and multithreading CPU-intensive
> > kernel work.
> > 
> > Quoting from an earlier series[1], the problem I'm trying to solve is
> > 
> >   A single CPU can spend an excessive amount of time in the kernel operating
> >   on large amounts of data.  Often these situations arise during initialization-
> >   and destruction-related tasks, where the data involved scales with system
> >   size.  These long-running jobs can slow startup and shutdown of applications
> >   and the system itself while extra CPUs sit idle.
> > 
> > Here are the current consumers:
> > 
> >  - struct page init (boot, hotplug, pmem)
> >  - VFIO page pinning (kvm guest init)
> >  - fallocating a hugetlb file (database shared memory init)
> > 
> > On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s),
> > and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a
> > hugetlb file that occupy a significant fraction of memory.  This work results
> > in 7-20x speedups and is currently increasing the uptime of our production
> > kernels.
> > 
> > Future areas include munmap/exit, umount, and __ib_umem_release.  Some of these
> > need coarse locks broken up for multithreading (zone->lock, lru_lock).
> 
> I'm aware of this ib_umem_release request, it would be interesting to
> see, the main workload here is put_page and dma_unmap

Ah yes, I see it gets all the way down to zone->lock, so I should've said _all_
of the future cases need coarse locks broken.

By the way, there's an idea for dealing with zone->lock that I haven't yet had
time to look at.

http://lkml.kernel.org/r/20181018111632.GM5819@techsingularity.net


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-13 16:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 22:47 [LSF/MM/BPF TOPIC] kernel multithreading with padata Daniel Jordan
2020-02-12 23:31 ` Jason Gunthorpe
2020-02-13 16:13   ` Daniel Jordan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).