linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [RFC] zram: support page-based parallel write
Date: Thu, 8 Sep 2016 11:49:40 +0900	[thread overview]
Message-ID: <20160908024940.GB21691@bbox> (raw)
In-Reply-To: <20160908013444.GA505@swordfish>

Hi Sergey,

On Thu, Sep 08, 2016 at 10:34:44AM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> sorry, I don't have enough time at the moment to review it in details
> due to some urgent issues I'm working on.  can this wait?

Why not.

I need a time to complete the work for removing RFC tag, too.
I just posted incomplete code to discuss the direction, approach
with big design level, not a code detail level at this moment.

As waiting your time, I will enhance code qualitiy so you cannot see
the mess when you finally have a time to review. :)

> 
> 
> I was looking at loop.c awhile ago and was considering to do something
> similar to what they have done; but it never happened.
> 
> 
> I'm a bit 'surprised' that the performance has just 2x, whilst there
> are 4x processing threads. I'd rather expect it to be close to 4x... hm.

Me, too.
It seems it's related to work distribution to CPUs but not investigated
in detail.

> 
> 
> On (09/06/16 16:24), Minchan Kim wrote:
> [..]
> > +static void worker_wake_up(void)
> > +{
> > +	if (workers.nr_running * NR_BATCH_PAGES < workers.nr_req) {
> > +		int nr_wakeup = (workers.nr_req + NR_BATCH_PAGES) /
> > +				NR_BATCH_PAGES - workers.nr_running;
> > +
> > +		WARN_ON(!nr_wakeup);
> > +		wake_up_nr(&workers.req_wait, nr_wakeup);
> >  	}
> > +}
> 
> this seems to be quite complicated. use a wq perhaps? yes, there is a

Sure, It's Sooooo mess. I'm reconsidering now and finally should have
better way.

About wq, as you can guess by my function naming, I tried wq model first
for a while. It had several issues but can't say because I should know
wq internal detail to tell you exactly something but right now the time
is not allowed. Anyway, most important part with going thread model to me
is the job's completion time is totally unbounded.
While a thread is handling request, another context is queueing the request
endlessly. For this case, thread model is more proper, I think.

> common concern with the wq that all of the workers can stall during OOM,
> but you already have 2 kmalloc()-s in IO path (when adding a new request),
> so low memory condition concerns are out of sight here, I assume.

Yes, it's no problem because the logic have a fallback mechanism which
handle IO request synchronously if async logic is failed by some reason
(mostly -ENOMEM).

> 
> > +static int __zram_make_async_request(struct zram *zram, struct bio *bio)
> > +{
> > +	int offset;
> > +	u32 index;
> > +	struct bio_vec bvec;
> > +	struct bvec_iter iter;
> > +	LIST_HEAD(page_list);
> > +	struct bio_request *bio_req;
> > +	unsigned int nr_pages = 0;
> > +	bool write = op_is_write(bio_op(bio));
> > +
> > +	index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
> > +	offset = (bio->bi_iter.bi_sector &
> > +		  (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
> > +
> > +	bio_req = kmalloc(sizeof(*bio_req), GFP_NOIO);
> > +	if (!bio_req)
> > +		return 1;
> > +
> > +	/*
> > +	 * Keep bi_vcnt to complete bio handling when all of pages
> > +	 * in the bio are handled.
> > +	 */
> > +	bio_req->bio = bio;
> > +	atomic_set(&bio_req->nr_pages, 0);
> > +
> > +	bio_for_each_segment(bvec, bio, iter) {
> > +		int max_transfer_size = PAGE_SIZE - offset;
> > +
> > +		if (bvec.bv_len > max_transfer_size) {
> > +			/*
> > +			 * zram_bvec_rw() can only make operation on a single
> > +			 * zram page. Split the bio vector.
> > +			 */
> > +			struct bio_vec bv;
> > +
> > +			bv.bv_page = bvec.bv_page;
> > +			bv.bv_len = max_transfer_size;
> > +			bv.bv_offset = bvec.bv_offset;
> > +
> > +			if (queue_page_request_list(zram, bio_req, &bv,
> > +				index, offset, write, &page_list))
> > +				goto out;
> > +			nr_pages++;
> > +
> > +			bv.bv_len = bvec.bv_len - max_transfer_size;
> > +			bv.bv_offset += max_transfer_size;
> > +			if (queue_page_request_list(zram, bio_req, &bv,
> > +				index + 1, 0, write, &page_list))
> > +				goto out;
> > +			nr_pages++;
> > +		} else
> > +			if (queue_page_request_list(zram, bio_req, &bvec,
> > +				index, offset, write, &page_list))
> > +				goto out;
> > +			nr_pages++;
> 			^^^^^^^^^^
> +	else {
> 		if (queue_page_request_list(zram, bio_req, &bvec...
> 			.....
> 			nr_pages++;
> + 	}

Thanks but it's wrong too. :)

The logic shoud be this way.

        else {
                if (queue_page_request_list)
                        goto out
                nr_pages++
        }

I will fix it.

> 
> 
> > +		update_position(&index, &offset, &bvec);
> > +	}
> > +
> > +	run_worker(bio, &page_list, nr_pages);
> > +	return 0;
> > +
> > +out:
> > +	kfree(bio_req);
> > +
> > +	WARN_ON(!list_empty(&page_list));
> > +	return 1;
> > +}
> [..]
> > +static int create_workers(void)
> > +{
> > +	int i;
> > +	int nr_cpu = num_online_cpus();
> > +	struct zram_worker *worker;
> > +
> > +	INIT_LIST_HEAD(&workers.worker_list);
> > +	INIT_LIST_HEAD(&workers.req_list);
> > +	spin_lock_init(&workers.req_lock);
> > +	init_waitqueue_head(&workers.req_wait);
> > +
> > +	for (i = 0; i < nr_cpu; i++) {
> > +		worker = kmalloc(sizeof(*worker), GFP_KERNEL);
> > +		if (!worker)
> > +			goto error;
> > +
> > +		worker->task = kthread_run(zram_thread, NULL, "zramd-%d", i);
> 
> there may be several zram devices. may be    "zram-%d/%d, device_id, i"

zram thread is global. IOW, although there are hundred zram device,
the number of thread is equal to online CPU. ;)

> 
> > +		if (IS_ERR(worker->task)) {
> > +			kfree(worker);
> > +			goto error;
> > +		}
> > +
> > +		list_add(&worker->list, &workers.worker_list);
> > +	}
> > +
> > +	return 0;
> > +
> > +error:
> > +	destroy_workers();
> > +	return 1;
> > +}
> 
> 	-ss

      reply	other threads:[~2016-09-08  2:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06  7:24 [RFC] zram: support page-based parallel write Minchan Kim
2016-09-06  8:22 ` Andreas Mohr
2016-09-08  2:31   ` Minchan Kim
2016-09-08  1:34 ` Sergey Senozhatsky
2016-09-08  2:49   ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160908024940.GB21691@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).