All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/5] Online data deduplication
@ 2013-07-31 15:37 Liu Bo
  2013-07-31 15:37 ` [RFC PATCH v5 1/5] Btrfs: skip merge part for delayed data refs Liu Bo
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Liu Bo @ 2013-07-31 15:37 UTC (permalink / raw)
  To: linux-btrfs

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2].

PATCH 1 is a hang fix with deduplication on, but it's also useful without
dedup in practice use.

PATCH 2 and 3 are targetting delayed refs' scalability problems, which are
uncovered by the dedup feature.

PATCH 4 is a speed-up improvement, which is about dedup and quota.

PATCH 5 is full of real things, all details about implementation of dedup.

Plus, there is also a btrfs-progs patch which helps to enable/disable dedup
feature.

TODO:
* a bit-to-bit comparison callback.

All comments are welcome!

[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

v4->v5:
- go back to one dedup key with a special backref for dedup tree because
  the disk format understands backref well.
- fix a fsync hang with dedup enabled.
- rebase onto the latest btrfs.


Liu Bo (5):
  Btrfs: skip merge part for delayed data refs
  Btrfs: improve the delayed refs process in rm case
  Btrfs: introduce a head ref rbtree
  Btrfs: disable qgroups accounting when quata_enable is 0
  Btrfs: online data deduplication

 fs/btrfs/backref.c         |    9 +
 fs/btrfs/ctree.h           |   59 ++++
 fs/btrfs/delayed-ref.c     |  141 +++++++----
 fs/btrfs/delayed-ref.h     |    8 +
 fs/btrfs/disk-io.c         |   30 ++
 fs/btrfs/extent-tree.c     |  196 ++++++++++++--
 fs/btrfs/extent_io.c       |   29 ++-
 fs/btrfs/extent_io.h       |   16 ++
 fs/btrfs/file-item.c       |  217 +++++++++++++++
 fs/btrfs/inode.c           |  637 ++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/ioctl.c           |   93 +++++++
 fs/btrfs/ordered-data.c    |   36 ++-
 fs/btrfs/ordered-data.h    |   11 +-
 fs/btrfs/qgroup.c          |    6 +
 fs/btrfs/relocation.c      |    3 +
 fs/btrfs/super.c           |   27 ++-
 fs/btrfs/transaction.c     |    4 +-
 include/uapi/linux/btrfs.h |    5 +
 18 files changed, 1356 insertions(+), 171 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 30+ messages in thread
* [RFC PATCH v8 00/14] Online(inband) data deduplication
@ 2013-12-30  8:12 Liu Bo
  2013-12-30  8:12 ` [PATCH] Btrfs-progs: add dedup subcommand Liu Bo
  0 siblings, 1 reply; 30+ messages in thread
From: Liu Bo @ 2013-12-30  8:12 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Marcel Ritter, Christian Robert, alanqk

Hello,

Here is the New Year patch bomb :-)

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2],
it introduces inband data deduplication for btrfs and dedup/dedupe is for short.

PATCH 1 is a hang fix with deduplication on, but it's also useful without
dedup in practice use.

PATCH 2 and 3 are targetting delayed refs' scalability problems, which are
uncovered by the dedup feature.

PATCH 4 is a speed-up improvement, which is about dedup and quota.

PATCH 5-8 is the preparation work for dedup implementation.

PATCH 9 shows how we implement dedup feature.

PATCH 10 fixes a backref walking bug with dedup.

PATCH 11 fixes a free space bug of dedup extents on error handling.

PATCH 12 adds the ioctl to control dedup feature.

PATCH 13 fixes the metadata ENOSPC problem with dedup which has been there
WAY TOO LONG.

PATCH 14 fixes a race bug on dedup writes.

And there is also a btrfs-progs patch(PATCH 15) which offers all details about
how to control the dedup feature.

I've tested this with xfstests by adding a inline dedup 'enable & on' in xfstests'
mount and scratch_mount.

TODO:
* a bit-to-bit comparison callback.

All comments are welcome!


[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

v8:
- fix the race crash of dedup ref again.
- fix the metadata ENOSPC problem with dedup.

v7:
- rebase onto the lastest btrfs
- break a big patch into smaller ones to make reviewers happy.
- kill mount options of dedup and use ioctl method instead.
- fix two crash due to the special dedup ref

For former patch sets:
v6: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27512
v5: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27257
v4: http://thread.gmane.org/gmane.comp.file-systems.btrfs/25751
v3: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25433
v2: http://comments.gmane.org/gmane.comp.file-systems.btrfs/24959

Liu Bo (14):
  Btrfs: skip merge part for delayed data refs
  Btrfs: improve the delayed refs process in rm case
  Btrfs: introduce a head ref rbtree
  Btrfs: disable qgroups accounting when quata_enable is 0
  Btrfs: introduce dedup tree and relatives
  Btrfs: introduce dedup tree operations
  Btrfs: introduce dedup state
  Btrfs: make ordered extent aware of dedup
  Btrfs: online(inband) data dedup
  Btrfs: skip dedup reference during backref walking
  Btrfs: don't return space for dedup extent
  Btrfs: add ioctl of dedup control
  Btrfs: fix dedupe 'ENOSPC' problem
  Btrfs: fix a crash of dedup ref

 fs/btrfs/backref.c           |   9 +
 fs/btrfs/ctree.c             |   2 +-
 fs/btrfs/ctree.h             |  86 ++++++
 fs/btrfs/delayed-ref.c       | 161 +++++++----
 fs/btrfs/delayed-ref.h       |   8 +
 fs/btrfs/disk-io.c           |  40 +++
 fs/btrfs/extent-tree.c       | 208 ++++++++++++--
 fs/btrfs/extent_io.c         |  22 +-
 fs/btrfs/extent_io.h         |  16 ++
 fs/btrfs/file-item.c         | 244 +++++++++++++++++
 fs/btrfs/inode.c             | 635 ++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/ioctl.c             | 167 ++++++++++++
 fs/btrfs/ordered-data.c      |  38 ++-
 fs/btrfs/ordered-data.h      |  13 +-
 fs/btrfs/qgroup.c            |   3 +
 fs/btrfs/relocation.c        |   3 +
 fs/btrfs/transaction.c       |   4 +-
 include/trace/events/btrfs.h |   3 +-
 include/uapi/linux/btrfs.h   |  11 +
 19 files changed, 1501 insertions(+), 172 deletions(-)

-- 
1.8.2.1


^ permalink raw reply	[flat|nested] 30+ messages in thread
* [RFC PATCH V4 0/2] Online data deduplication
@ 2013-05-14 12:08 Liu Bo
  2013-05-14 12:08 ` [PATCH] Btrfs-progs: add dedup subcommand Liu Bo
  0 siblings, 1 reply; 30+ messages in thread
From: Liu Bo @ 2013-05-14 12:08 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, jbacik, g2p.code

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2].

PATCH 1 is a hang fix with deduplication on, but it's also useful with no
deduplication in practice use.

For more implementation details, please refer to PATCH 2.

Plus, there is also a btrfs-progs patch which helps to enable/disable dedup
feature.

TODO:
* a bit-to-bit comparison callback.

All comments are welcome!

[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

v4:
  * add INCOMPAT flag so that old kernel won't mount with a dedup btrfs.
  * elaborate error handling.
  * address a compress bug.
  * address an issue of dedup flag on extent state tree.
  * add new dedup ioctl interface.
v3:
  * add COMPRESS support
  * add a real ioctl to enable dedup feature
  * change the maximum allowed dedup blocksize to 128k because of compression
    range limit
v2:
  * To avoid enlarging the file extent item's size, add another index key used
    for freeing dedup extent.
  * Freeing dedup extent is now like how we delete checksum.
  * Add support for alternative deduplicatin blocksize larger than PAGESIZE.
  * Add a mount option to set deduplication blocksize.
  * Add support for those writes that are smaller than deduplication blocksize.

-----------------------------------------------------------
* HOW To turn deduplication on:

There are 2 steps you need to do before using it,
1) mount /dev/disk /mnt_of_your_btrfs -o dedup
   (or mount /dev/disk /mnt_of_your_btrfs -o dedup_bs=128K)
2) btrfs dedup register /mnt_of_your_btrfs
-----------------------------------------------------------
* HOW To turn deduplication off:

Just mount your btrfs without "-o dedup" or "-o dedup_bs=xxxK"
-----------------------------------------------------------
* HOW To disable deduplication completely:

There are 2 steps you need to do before using it,
1) mount your btrfs WITHOUT "-o dedup" or "-o dedup_bs=xxxK"

2) btrfs dedup unregister /mnt_fs_your_btrfs
(NOTE: 'unregister' won't work unless you do step 1 FIRSTLY.)
-----------------------------------------------------------

Liu Bo (2):
  Btrfs: skip merge part for delayed data refs
  Btrfs: online data deduplication

 fs/btrfs/ctree.h           |   63 +++++
 fs/btrfs/delayed-ref.c     |    7 +
 fs/btrfs/disk-io.c         |   34 +++-
 fs/btrfs/extent-tree.c     |    9 +
 fs/btrfs/extent_io.c       |   29 ++-
 fs/btrfs/extent_io.h       |   16 ++
 fs/btrfs/file-item.c       |  274 +++++++++++++++++++
 fs/btrfs/inode.c           |  630 ++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/ioctl.c           |   93 +++++++
 fs/btrfs/ordered-data.c    |   34 ++-
 fs/btrfs/ordered-data.h    |   11 +-
 fs/btrfs/super.c           |   27 ++-
 include/uapi/linux/btrfs.h |    5 +
 13 files changed, 1141 insertions(+), 91 deletions(-)

-- 
1.7.7


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-01-17 16:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-31 15:37 [RFC PATCH v5 0/5] Online data deduplication Liu Bo
2013-07-31 15:37 ` [RFC PATCH v5 1/5] Btrfs: skip merge part for delayed data refs Liu Bo
2013-07-31 15:37 ` [RFC PATCH v5 2/5] Btrfs: improve the delayed refs process in rm case Liu Bo
2013-07-31 16:45   ` Stefan Behrens
2013-07-31 15:37 ` [RFC PATCH v5 3/5] Btrfs: introduce a head ref rbtree Liu Bo
2013-07-31 21:19   ` Zach Brown
2013-07-31 15:37 ` [RFC PATCH v5 4/5] Btrfs: disable qgroups accounting when quota is off Liu Bo
2013-08-05 12:34   ` Jan Schmidt
2013-08-05 14:18     ` Liu Bo
2013-08-05 15:10       ` Jan Schmidt
2013-08-06  2:25         ` Liu Bo
2013-07-31 15:37 ` [RFC PATCH v5 5/5] Btrfs: online data deduplication Liu Bo
2013-07-31 22:50   ` Zach Brown
2013-08-01 10:14     ` Liu Bo
2013-08-01 18:35       ` Zach Brown
2013-07-31 15:37 ` [PATCH] Btrfs-progs: add dedup subcommand Liu Bo
2013-07-31 16:30   ` Stefan Behrens
2013-08-01 10:17     ` Liu Bo
2013-08-01 22:01   ` Mark Fasheh
2013-08-02  2:29     ` Liu Bo
2013-07-31 21:20 ` [RFC PATCH v5 0/5] Online data deduplication Josef Bacik
2013-08-01 10:16   ` Liu Bo
  -- strict thread matches above, loose matches on Subject: below --
2013-12-30  8:12 [RFC PATCH v8 00/14] Online(inband) " Liu Bo
2013-12-30  8:12 ` [PATCH] Btrfs-progs: add dedup subcommand Liu Bo
2013-12-30 11:34   ` Martin Steigerwald
2013-12-31  3:18     ` Liu Bo
2013-12-31  3:24     ` Kai Krakow
2014-01-14 17:34   ` David Sterba
2014-01-15  1:35     ` Liu Bo
2014-01-17 16:14       ` David Sterba
2013-05-14 12:08 [RFC PATCH V4 0/2] Online data deduplication Liu Bo
2013-05-14 12:08 ` [PATCH] Btrfs-progs: add dedup subcommand Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.