kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
From: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
To: Bharath Vedartham <linux.bhar@gmail.com>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: Overwriting copy functionality in filesystem
Date: Thu, 28 Mar 2019 16:07:02 -0400	[thread overview]
Message-ID: <24069.1553803622@turing-police> (raw)
In-Reply-To: <20190328183017.GA3568@bharath12345-Inspiron-5559>

On Fri, 29 Mar 2019 00:00:17 +0530, Bharath Vedartham said:

> I was thinking of a use case where we are copying a huge file (say 100
> GB), if we do copy-on-write we can speed up /bin/cp for such files i
> feel. Any comments on this?

Hmm.. wait a minute.  What definition of "copy on write" are you using?

Hint - if you're copying an *entire* 100GB file, the *fastest* way is to simply
make a second hard link to the file. If you're determined to make an entire
second copy, you're going to be reading 100GB and writing 100GB, and the
exact details aren't going to matter all that much.

Now, where you can get clever is if you create your 100GB file, and then
somebody only changes 8K of the file.  There's no need to copy all 100GB into a
new file if you are able to record "oh, and this 8K got changed". You only need
to write the 8K of changes, and some metadata.

(Similar tricks are used for shared libraries and pre-zero'ed storage.  Everybody
gets a reference to the same copy of the page(s) in memory - until somebody
scribbles on a page.

So say you have a 30MB shared object in memory, with 5 users.  That's 5 references
to the same data.  Now one user writes to it.  The system catches that write (usually
via a page fault), copies just the one page to a new page, and then lets the write to the new
page complete.  Now we have 5 users that all have references to the same (30M-4K)
of data, 4 users that have a reference to the old copy of that 4K, and one user that
has a reference to the modified copy of that 4K.

https://en.wikipedia.org/wiki/Copy-on-write

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

  reply	other threads:[~2019-03-28 20:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23 16:59 Overwriting copy functionality in filesystem Bharath Vedartham
2019-03-23 19:01 ` Valdis Klētnieks
2019-03-24 13:18   ` Bharath Vedartham
2019-03-24 14:06     ` Valdis Klētnieks
2019-03-28 18:30       ` Bharath Vedartham
2019-03-28 20:07         ` Valdis Klētnieks [this message]
2019-03-23 19:03 ` Bernd Petrovitsch
2019-03-24 13:21   ` Bharath Vedartham
2019-03-23 19:05 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24069.1553803622@turing-police \
    --to=valdis.kletnieks@vt.edu \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=linux.bhar@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).