Kernel Newbies archive on lore.kernel.org
 help / color / Atom feed
From: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
To: Bharath Vedartham <linux.bhar@gmail.com>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: Overwriting copy functionality in filesystem
Date: Thu, 28 Mar 2019 16:07:02 -0400
Message-ID: <24069.1553803622@turing-police> (raw)
In-Reply-To: <20190328183017.GA3568@bharath12345-Inspiron-5559>

On Fri, 29 Mar 2019 00:00:17 +0530, Bharath Vedartham said:

> I was thinking of a use case where we are copying a huge file (say 100
> GB), if we do copy-on-write we can speed up /bin/cp for such files i
> feel. Any comments on this?

Hmm.. wait a minute.  What definition of "copy on write" are you using?

Hint - if you're copying an *entire* 100GB file, the *fastest* way is to simply
make a second hard link to the file. If you're determined to make an entire
second copy, you're going to be reading 100GB and writing 100GB, and the
exact details aren't going to matter all that much.

Now, where you can get clever is if you create your 100GB file, and then
somebody only changes 8K of the file.  There's no need to copy all 100GB into a
new file if you are able to record "oh, and this 8K got changed". You only need
to write the 8K of changes, and some metadata.

(Similar tricks are used for shared libraries and pre-zero'ed storage.  Everybody
gets a reference to the same copy of the page(s) in memory - until somebody
scribbles on a page.

So say you have a 30MB shared object in memory, with 5 users.  That's 5 references
to the same data.  Now one user writes to it.  The system catches that write (usually
via a page fault), copies just the one page to a new page, and then lets the write to the new
page complete.  Now we have 5 users that all have references to the same (30M-4K)
of data, 4 users that have a reference to the old copy of that 4K, and one user that
has a reference to the modified copy of that 4K.

https://en.wikipedia.org/wiki/Copy-on-write

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23 16:59 Bharath Vedartham
2019-03-23 19:01 ` Valdis Klētnieks
2019-03-24 13:18   ` Bharath Vedartham
2019-03-24 14:06     ` Valdis Klētnieks
2019-03-28 18:30       ` Bharath Vedartham
2019-03-28 20:07         ` Valdis Klētnieks [this message]
2019-03-23 19:03 ` Bernd Petrovitsch
2019-03-24 13:21   ` Bharath Vedartham
2019-03-23 19:05 ` Greg KH

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24069.1553803622@turing-police \
    --to=valdis.kletnieks@vt.edu \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=linux.bhar@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org kernelnewbies@archiver.kernel.org
	public-inbox-index kernelnewbies


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/ public-inbox