From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751212AbdEPCgQ (ORCPT <rfc822;w@1wt.eu>);
        Mon, 15 May 2017 22:36:16 -0400
Received: from mail-pg0-f67.google.com ([74.125.83.67]:35423 "EHLO
        mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750713AbdEPCgP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 15 May 2017 22:36:15 -0400
Date: Tue, 16 May 2017 11:36:15 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        linux-kernel@vger.kernel.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
        kernel-team <kernel-team@lge.com>
Subject: Re: [PATCH 2/2] zram: do not count duplicated pages as compressed
Message-ID: <20170516023615.GC10262@jagdpanzerIV.localdomain>
References: <1494834068-27004-1-git-send-email-minchan@kernel.org>
 <1494834068-27004-2-git-send-email-minchan@kernel.org>
 <20170516013022.GB10262@jagdpanzerIV.localdomain>
 <20170516015919.GA5233@bbox>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170516015919.GA5233@bbox>
User-Agent: Mutt/1.8.2 (2017-04-18)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello Minchan,

On (05/16/17 10:59), Minchan Kim wrote:
> Hi Sergey,
>

[..]
> You mean this?
> 
> static void zram_free_page(..) {
>         if (zram_test_flag(zram, index, ZRAM_SAME))
>                 ...
> 
>         if (!entry)
>                 return;
> 
>         if (zram_dedup_enabled(zram) && xxxx)) {
>                 zram_clear_flag(ZRAM_DUP);
>                 atomic64_sub(entry->len, &zram->stats.dup_data_size);
>         } else {
>                 atomic64_sub(zram_get_obj_size(zram, index),
>                                         &zram->stats.compr_dat_size);
>         }
> 
>         zram_entry_free
>         zram_set_entry
>         zram_set_obj_size
> }

yeah, something like this.

> > > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
> > >  	entry = zram_dedup_find(zram, page, &checksum);
> > >  	if (entry) {
> > >  		comp_len = entry->len;
> > > -		goto found_dup;
> > > +		zram_slot_lock(zram, index);
> > > +		zram_free_page(zram, index);
> > > +		zram_set_flag(zram, index, ZRAM_DUP);
> > > +		zram_set_entry(zram, index, entry);
> > > +		zram_set_obj_size(zram, index, comp_len);
> > > +		zram_slot_unlock(zram, index);
> > > +		atomic64_add(comp_len, &zram->stats.dup_data_size);
> > > +		atomic64_inc(&zram->stats.pages_stored);
> > > +		return 0;
> > 
> > hm. that's a somewhat big code duplication. isn't it?
> 
> Yub. 3 parts. above part,  zram_same_page_write and tail of __zram_bvec_write.

hmm... good question... hardly can think of anything significantly
better, zram object handling is now a mix of flags, entries,
ref_counters, etc. etc. may be we can merge some of those ops, if we
would keep slot locked through the entire __zram_bvec_write(), but
that does not look attractive.

something ABSOLUTELY untested and incomplete. not even compile tested (!).
99% broken and stupid (!). but there is one thing that it has revealed, so
thus incomplete. see below:

---

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 372602c7da49..b31543c40d54 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -509,11 +509,8 @@ static bool zram_same_page_write(struct zram *zram, u32 index,
        if (page_same_filled(mem, &element)) {
                kunmap_atomic(mem);
                /* Free memory associated with this sector now. */
-               zram_slot_lock(zram, index);
-               zram_free_page(zram, index);
                zram_set_flag(zram, index, ZRAM_SAME);
                zram_set_element(zram, index, element);
-               zram_slot_unlock(zram, index);
 
                atomic64_inc(&zram->stats.same_pages);
                return true;
@@ -778,7 +775,7 @@ static int zram_compress(struct zram *zram, struct zcomp_strm **zstrm,
 
 static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
 {
-       int ret;
+       int ret = 0;
        struct zram_entry *entry;
        unsigned int comp_len;
        void *src, *dst;
@@ -786,12 +783,20 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
        struct page *page = bvec->bv_page;
        u32 checksum;
 
+       /*
+        * Free memory associated with this sector
+        * before overwriting unused sectors.
+        */
+       zram_slot_lock(zram, index);
+       zram_free_page(zram, index);
+
        if (zram_same_page_write(zram, index, page))
-               return 0;
+               goto out_unlock;
 
        entry = zram_dedup_find(zram, page, &checksum);
        if (entry) {
                comp_len = entry->len;
+               zram_set_flag(zram, index, ZRAM_DUP);
                goto found_dup;
        }
 
@@ -799,7 +804,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
        ret = zram_compress(zram, &zstrm, page, &entry, &comp_len);
        if (ret) {
                zcomp_stream_put(zram->comp);
-               return ret;
+               goto out_unlock;
        }
 
        dst = zs_map_object(zram->mem_pool,
@@ -817,20 +822,16 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index)
        zram_dedup_insert(zram, entry, checksum);
 
 found_dup:
-       /*
-        * Free memory associated with this sector
-        * before overwriting unused sectors.
-        */
-       zram_slot_lock(zram, index);
-       zram_free_page(zram, index);
        zram_set_entry(zram, index, entry);
        zram_set_obj_size(zram, index, comp_len);
-       zram_slot_unlock(zram, index);
 
        /* Update stats */
        atomic64_add(comp_len, &zram->stats.compr_data_size);
        atomic64_inc(&zram->stats.pages_stored);
-       return 0;
+
+out_unlock:
+       zram_slot_unlock(zram, index);
+       return ret;
 }

---


namely,
that zram_compress() error return path from __zram_bvec_write().

currently, we touch the existing compressed object and overwrite it only
when we successfully compressed a new object. when zram_compress() fails
we propagate the error, but never touch the old object. so all reads that
could hit that index later will read stale data. and probably it would
make more sense to fail those reads as well; IOW to free the old page
regardless zram_compress() progress.

what do you think?

	-ss