Kernel Newbies Archive on lore.kernel.org
 help / color / Atom feed
* Understanding the locking behavior of msync
@ 2021-03-24 11:56 Maximilian Böther
  2021-03-27  1:57 ` Mulyadi Santosa
  0 siblings, 1 reply; 2+ messages in thread
From: Maximilian Böther @ 2021-03-24 11:56 UTC (permalink / raw)
  To: kernelnewbies

Hello!

I am investigating an application that writes random data in fixed-size 
chunks (e.g. 4k) to random locations in a large buffer file. I have 
several processes (not threads) doing that, each process has its own 
buffer file assigned.

If I use mmap+msync to write and persist data to disk, I see a 
performance spike for 16 processes, and a performance drop for more 
threads (32 processes). The CPU has 32 logical cores in total, and we 
are not CPU bound.

If I use open+write+fsync, I do not see such a spike, instead a 
performance plateau (and mmap is slower than open/write).

I've read multiple times [1,2] that both mmap and msync can take locks. 
With vtune, I analyzed that we are indeed spinlocking, and spending the 
most time in clear_page_erms and xas_load functions.

However, when reading the source code for msync [3], I cannot understand 
whether these locks are global or per-file. The paper [2] states that 
the locks are on radix-trees within the kernel that are per-file, 
however, as I do observe some spinlocks in the kernel, I believe that 
some locks may be global, as I have one file per process.

Do you have an explanation on why we have such a spike at 16 processes 
for mmap and input on the locking behavior of msync?

Thank you!

Best,
Maximilian Böther

[1] 
https://kb.pmem.io/development/100000025-Why-msync-is-less-optimal-for-persistent-memory/ 
- I know it's about PMem, but the lock argument is important

[2] Optimizing Memory-mapped I/O for Fast Storage Devices, Papagiannis 
et al., ATC '20

[3] https://elixir.bootlin.com/linux/latest/source/mm/msync.c

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Understanding the locking behavior of msync
  2021-03-24 11:56 Understanding the locking behavior of msync Maximilian Böther
@ 2021-03-27  1:57 ` Mulyadi Santosa
  0 siblings, 0 replies; 2+ messages in thread
From: Mulyadi Santosa @ 2021-03-27  1:57 UTC (permalink / raw)
  To: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 2070 bytes --]

On Fri, Mar 26, 2021, 22:05 Maximilian Böther <
maximilian.boether@student.hpi.de> wrote:

> Hello!
>
> I am investigating an application that writes random data in fixed-size
> chunks (e.g. 4k) to random locations in a large buffer file. I have
> several processes (not threads) doing that, each process has its own
> buffer file assigned.
>
> If I use mmap+msync to write and persist data to disk, I see a
> performance spike for 16 processes, and a performance drop for more
> threads (32 processes). The CPU has 32 logical cores in total, and we
> are not CPU bound.
>
> If I use open+write+fsync, I do not see such a spike, instead a
> performance plateau (and mmap is slower than open/write).
>
> I've read multiple times [1,2] that both mmap and msync can take locks.
> With vtune, I analyzed that we are indeed spinlocking, and spending the
> most time in clear_page_erms and xas_load functions.
>
> However, when reading the source code for msync [3], I cannot understand
> whether these locks are global or per-file. The paper [2] states that
> the locks are on radix-trees within the kernel that are per-file,
> however, as I do observe some spinlocks in the kernel, I believe that
> some locks may be global, as I have one file per process.
>
> Do you have an explanation on why we have such a spike at 16 processes
> for mmap and input on the locking behavior of msync?
>
> Thank you!
>
> Best,
> Maximilian Böther
>
> [1]
>
> https://kb.pmem.io/development/100000025-Why-msync-is-less-optimal-for-persistent-memory/
> - I know it's about PMem, but the lock argument is important
>
> [2] Optimizing Memory-mapped I/O for Fast Storage Devices, Papagiannis
> et al., ATC '20
>
> [3] https://elixir.bootlin.com/linux/latest/source/mm/msync.c
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbi
> <https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies>es
>

Is it NUMA?

[-- Attachment #1.2: Type: text/html, Size: 3126 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 11:56 Understanding the locking behavior of msync Maximilian Böther
2021-03-27  1:57 ` Mulyadi Santosa

Kernel Newbies Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org
	public-inbox-index kernelnewbies

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git