All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Nicholas D Steeves <nsteeves@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework
Date: Wed, 6 Apr 2016 13:22:44 +0800	[thread overview]
Message-ID: <57049D24.80300@cn.fujitsu.com> (raw)
In-Reply-To: <CAD=QJKjvKVZTdqVVZJGtve+9teRXgGpv4chY_NV7P5BuFwB1Gw@mail.gmail.com>



Nicholas D Steeves wrote on 2016/04/05 23:47 -0400:
> On 4 April 2016 at 12:55, David Sterba <dsterba@suse.cz> wrote:
>>>>> Not exactly. If we are using unsafe hash, e.g MD5, we will use MD5 only
>>>>> for both in-memory and on-disk backend. No SHA256 again.
>>>>
>>>> I'm proposing unsafe but fast, which MD5 is not. Look for xxhash or
>>>> murmur. As they're both order-of-magnitutes faster than sha1/md5, we can
>>>> actually hash both to reduce the collisions.
>>>
>>> Don't quite like the idea to use 2 hash other than 1.
>>> Yes, some program like rsync uses this method, but this also involves a
>>> lot of details, like the order to restore them on disk.
>>
>> I'm considering fast-but-unsafe hashes for the in-memory backend, where
>> the speed matters and we cannot hide the slow sha256 calculations behind
>> the IO (ie. no point to save microseconds if the IO is going to take
>> milliseconds).
>>
>>>>> In that case, for MD5 hit case, we will do a full byte-to-byte
>>>>> comparison. It may be slow or fast, depending on the cache.
>>>>
>>>> If the probability of hash collision is low, so the number of needed
>>>> byte-to-byte comparisions is also low.
>
> It is unlikely that I will use dedupe, but I imagine your work will
> apply tot he following wishlist:
>
> 1. Allow disabling of memory-backend hash via a kernel argument,
> sysctl, or mount option for those of us have ECC RAM.
>      * page_cache never gets pushed to swap, so this should be safe, no?

Why not use current ioctl to disable dedupe?

And why it's related to ECC RAM? To avoid memory corruption which will 
finally lead to file corruption?
If so, it makes sense.

Also I didn't get the point when you mention page_cache.
For hash pool, we didn't use page cache. We just use kmalloc, which 
won't be swapped out.
For file page cache, it's not affected at all.


> 2. Implementing an intelligent cache so that it's possible to offset
> the cost of hashing the most actively read data.  I'm guessing there's
> already some sort of weighed cache eviction algorithm in place, but I
> don't yet know how to look into it, let alone enough to leverage it...

I not quite a fan of such intelligent but complicated cache design.
The main problem is we are putting police into kernel space.

Currently, either use last-recent-use in-memory backend, or use all-in 
ondisk backend.
For user want more precious control on which file/dir shouldn't go 
through dedupe, they have the btrfs prop to set per-file flag to avoid 
dedupe.

>      * on the topic of leaning on the cache, I've been thinking about
> ways to optimize reads, while minimizing seeks on multi-spindle raid1
> btrfs volumes.  I'm guessing that someone will commit a solution
> before I manage to teach myself enough about filesystems to contribute
> something useful.
>
> That's it, in terms of features I want ;-)
>
> It's probably a well-known fact, but sha512 is roughly 40 to 50%
> faster than sha256, and 40 to 50% slower than sha1 on my 1200-series
> Xeon v3 (Haswell), for 8192 size blocks.

Sadly I didn't know it until recent days. :(
Or I would have implemented SHA512 hash algorithm instead SHA256.

Anyway, it's not that hard to add a new hash algorithm.

Thanks for your comments.
Qu

>
> Wish I could do more right now!
> Nicholas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



  reply	other threads:[~2016-04-06  5:22 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22  1:35 [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 01/27] btrfs: dedupe: Introduce dedupe framework and its header Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 02/27] btrfs: dedupe: Introduce function to initialize dedupe info Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 03/27] btrfs: dedupe: Introduce function to add hash into in-memory tree Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 04/27] btrfs: dedupe: Introduce function to remove hash from " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 05/27] btrfs: delayed-ref: Add support for increasing data ref under spinlock Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 06/27] btrfs: dedupe: Introduce function to search for an existing hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 07/27] btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 08/27] btrfs: ordered-extent: Add support for dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 09/27] btrfs: dedupe: Inband in-memory only de-duplication implement Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method Qu Wenruo
2016-03-24 20:58   ` Chris Mason
2016-03-25  1:59     ` Qu Wenruo
2016-03-25 15:11       ` Chris Mason
2016-03-26 13:11         ` Qu Wenruo
2016-03-28 14:09           ` Chris Mason
2016-03-29  1:47             ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 11/27] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info Qu Wenruo
2016-03-29 17:31   ` Alex Lyakas
2016-03-30  0:26     ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 12/27] btrfs: dedupe: Add support for on-disk hash search Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 13/27] btrfs: dedupe: Add support to delete hash for on-disk backend Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 14/27] btrfs: dedupe: Add support for adding " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 15/27] btrfs: dedupe: Add ioctl for inband dedupelication Qu Wenruo
2016-03-22  2:29   ` kbuild test robot
2016-03-22  2:48   ` kbuild test robot
2016-03-22  1:35 ` [PATCH v8 16/27] btrfs: dedupe: add an inode nodedupe flag Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 17/27] btrfs: dedupe: add a property handler for online dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 18/27] btrfs: dedupe: add per-file online dedupe control Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space Qu Wenruo
2016-04-22 18:06   ` Josef Bacik
2016-04-25  0:54     ` Qu Wenruo
2016-04-25 14:05       ` Josef Bacik
2016-04-26  0:50         ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 20/27] btrfs: dedupe: Fix a bug when running inband dedupe with balance Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 21/27] btrfs: Fix a memory leak in inband dedupe hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 22/27] btrfs: dedupe: Fix metadata balance error when dedupe is enabled Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 23/27] btrfs: dedupe: Avoid submit IO for hash hit extent Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 24/27] btrfs: dedupe: Preparation for compress-dedupe co-work Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 25/27] btrfs: dedupe: Add support for compression and dedpue Qu Wenruo
2016-03-24 20:35   ` Chris Mason
2016-03-25  1:44     ` Qu Wenruo
2016-03-25 15:12       ` Chris Mason
2016-03-22  1:35 ` [PATCH v8 26/27] btrfs: relocation: Enhance error handling to avoid BUG_ON Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 27/27] btrfs: dedupe: Fix a space cache delalloc bytes underflow bug Qu Wenruo
2016-03-22 13:38 ` [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework David Sterba
2016-03-23  2:25   ` Qu Wenruo
2016-03-24 13:42     ` David Sterba
2016-03-25  1:38       ` Qu Wenruo
2016-04-04 16:55         ` David Sterba
2016-04-05  3:08           ` Qu Wenruo
2016-04-20  2:02             ` Qu Wenruo
2016-04-20 19:14               ` Chris Mason
2016-04-06  3:47           ` Nicholas D Steeves
2016-04-06  5:22             ` Qu Wenruo [this message]
2016-04-22 22:14               ` Nicholas D Steeves
2016-04-25  1:25                 ` Qu Wenruo
2016-03-29 17:22 ` Alex Lyakas
2016-03-30  0:34   ` Qu Wenruo
2016-03-30 10:36     ` Alex Lyakas
2016-04-03  8:22     ` Alex Lyakas
2016-04-05  3:51       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57049D24.80300@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nsteeves@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.