All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
@ 2013-09-06 15:12 Sergey Senozhatsky
  2013-09-09 12:33 ` Dan Carpenter
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-06 15:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Minchan Kim, Jerome Marchand, devel, linux-kernel

Calling handle_pending_slot_free() for every RW operation may
cause unneccessary slot_free_lock locking, because most likely
process will see NULL slot_free_rq. handle_pending_slot_free()
only when current detects that slot_free_rq is not NULL.

v2: protect handle_pending_slot_free() with zram rw_lock.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 drivers/staging/zram/zram_drv.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 91d94b5..5bfbe0e 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -517,6 +517,7 @@ static void handle_pending_slot_free(struct zram *zram)
 {
 	struct zram_slot_free *free_rq;
 
+	down_write(&zram->lock);
 	spin_lock(&zram->slot_free_lock);
 	while (zram->slot_free_rq) {
 		free_rq = zram->slot_free_rq;
@@ -525,6 +526,7 @@ static void handle_pending_slot_free(struct zram *zram)
 		kfree(free_rq);
 	}
 	spin_unlock(&zram->slot_free_lock);
+	up_write(&zram->lock);
 }
 
 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
@@ -532,14 +534,15 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 {
 	int ret;
 
+	if (zram->slot_free_rq)
+		handle_pending_slot_free(zram);
+
 	if (rw == READ) {
 		down_read(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
 	}


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-06 15:12 [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
@ 2013-09-09 12:33 ` Dan Carpenter
  2013-09-09 12:49   ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Carpenter @ 2013-09-09 12:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

On Fri, Sep 06, 2013 at 06:12:55PM +0300, Sergey Senozhatsky wrote:
> Calling handle_pending_slot_free() for every RW operation may
> cause unneccessary slot_free_lock locking, because most likely
> process will see NULL slot_free_rq. handle_pending_slot_free()
> only when current detects that slot_free_rq is not NULL.
> 
> v2: protect handle_pending_slot_free() with zram rw_lock.
> 

zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
rw_lock be wrapped around the whole operation like the original code
does?  I don't know the zram code, but the original looks like it makes
sense but in this one it looks like the locks are duplicative.

Is the down_read() in the original code be changed to down_write()?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 12:33 ` Dan Carpenter
@ 2013-09-09 12:49   ` Sergey Senozhatsky
  2013-09-09 13:21     ` Dan Carpenter
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-09 12:49 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

> > Calling handle_pending_slot_free() for every RW operation may
> > cause unneccessary slot_free_lock locking, because most likely
> > process will see NULL slot_free_rq. handle_pending_slot_free()
> > only when current detects that slot_free_rq is not NULL.
> > 
> > v2: protect handle_pending_slot_free() with zram rw_lock.
> > 
> 
> zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
> rw_lock be wrapped around the whole operation like the original code
> does?  I don't know the zram code, but the original looks like it makes
> sense but in this one it looks like the locks are duplicative.
> 
> Is the down_read() in the original code be changed to down_write()?
>

I'm not touching locking around existing READ/WRITE commands.

the original code:

static void handle_pending_slot_free(struct zram *zram)
{
        struct zram_slot_free *free_rq;

        spin_lock(&zram->slot_free_lock);
        while (zram->slot_free_rq) {
                free_rq = zram->slot_free_rq;
                zram->slot_free_rq = free_rq->next;
                zram_free_page(zram, free_rq->index);
                kfree(free_rq);
        }
        spin_unlock(&zram->slot_free_lock);
}

static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
                        int offset, struct bio *bio, int rw)
{
        int ret;

        if (rw == READ) {
                down_read(&zram->lock);
                handle_pending_slot_free(zram);
                ret = zram_bvec_read(zram, bvec, index, offset, bio);
                up_read(&zram->lock);
        } else {
                down_write(&zram->lock);
                handle_pending_slot_free(zram);
                ret = zram_bvec_write(zram, bvec, index, offset);
                up_write(&zram->lock);
        }

        return ret;
}



the new one:

static void handle_pending_slot_free(struct zram *zram)
{
        struct zram_slot_free *free_rq;

        down_write(&zram->lock);
        spin_lock(&zram->slot_free_lock);
        while (zram->slot_free_rq) {
                free_rq = zram->slot_free_rq;
                zram->slot_free_rq = free_rq->next;
                zram_free_page(zram, free_rq->index);
                kfree(free_rq);
        }
        spin_unlock(&zram->slot_free_lock);
        up_write(&zram->lock);
}

static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
                        int offset, struct bio *bio, int rw)
{
        int ret;

        if (zram->slot_free_rq)
                handle_pending_slot_free(zram);

        if (rw == READ) {
                down_read(&zram->lock);
                ret = zram_bvec_read(zram, bvec, index, offset, bio);
                up_read(&zram->lock);
        } else {
                down_write(&zram->lock);
                ret = zram_bvec_write(zram, bvec, index, offset);
                up_write(&zram->lock);
        }

        return ret;
}


both READ and WRITE operations are still protected by down_read() for READ path
and down_write() for WRITE path. however, there is no handle_pending_slot_free()
and zram->slot_free_lock locking on every READ/WRITE, instead handle_pending_slot_free()
is called only when zram->slot_free_rq is not NULL. handle_pending_slot_free() in
turn protects zram_free_page() call by down_write(), so no READ/WRITE operations
are affected.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 12:49   ` Sergey Senozhatsky
@ 2013-09-09 13:21     ` Dan Carpenter
  2013-09-09 13:46       ` Jerome Marchand
  2013-09-09 14:42       ` Sergey Senozhatsky
  0 siblings, 2 replies; 27+ messages in thread
From: Dan Carpenter @ 2013-09-09 13:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
> > > Calling handle_pending_slot_free() for every RW operation may
> > > cause unneccessary slot_free_lock locking, because most likely
> > > process will see NULL slot_free_rq. handle_pending_slot_free()
> > > only when current detects that slot_free_rq is not NULL.
> > > 
> > > v2: protect handle_pending_slot_free() with zram rw_lock.
> > > 
> > 
> > zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
> > rw_lock be wrapped around the whole operation like the original code
> > does?  I don't know the zram code, but the original looks like it makes
> > sense but in this one it looks like the locks are duplicative.
> > 
> > Is the down_read() in the original code be changed to down_write()?
> >
> 
> I'm not touching locking around existing READ/WRITE commands.
> 

Your patch does change the locking because now instead of taking the
zram lock once it takes it and then drops it and then retakes it.  This
looks potentially racy to me but I don't know the code so I will defer
to any zram maintainer.

1) You haven't given us any performance numbers so it's not clear if the
   locking is even a problem.

2) The v2 patch introduces an obvious deadlock in zram_slot_free()
   because now we take the rw_lock twice.  Fix your testing to catch
   this kind of bug next time.

3) Explain why it is safe to test zram->slot_free_rq when we are not
   holding the lock.  I think it is unsafe.  I don't want to even think
   about it without the numbers.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 13:21     ` Dan Carpenter
@ 2013-09-09 13:46       ` Jerome Marchand
  2013-09-09 16:10         ` Jerome Marchand
  2013-09-09 14:42       ` Sergey Senozhatsky
  1 sibling, 1 reply; 27+ messages in thread
From: Jerome Marchand @ 2013-09-09 13:46 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sergey Senozhatsky, Greg Kroah-Hartman, devel, Minchan Kim, linux-kernel

On 09/09/2013 03:21 PM, Dan Carpenter wrote:
> On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
>>>> Calling handle_pending_slot_free() for every RW operation may
>>>> cause unneccessary slot_free_lock locking, because most likely
>>>> process will see NULL slot_free_rq. handle_pending_slot_free()
>>>> only when current detects that slot_free_rq is not NULL.
>>>>
>>>> v2: protect handle_pending_slot_free() with zram rw_lock.
>>>>
>>>
>>> zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
>>> rw_lock be wrapped around the whole operation like the original code
>>> does?  I don't know the zram code, but the original looks like it makes
>>> sense but in this one it looks like the locks are duplicative.
>>>
>>> Is the down_read() in the original code be changed to down_write()?
>>>
>>
>> I'm not touching locking around existing READ/WRITE commands.
>>
> 
> Your patch does change the locking because now instead of taking the
> zram lock once it takes it and then drops it and then retakes it.  This
> looks potentially racy to me but I don't know the code so I will defer
> to any zram maintainer.

You're right. Nothing prevents zram_slot_free_notify() to repopulate the
free slot queue while we drop the lock.

Actually, the original code is already racy. handle_pending_slot_free()
modifies zram->table while holding only a read lock. It needs to hold a
write lock to do that. Using down_write for all requests would obviously
fix that, but at the cost of read performance.

> 
> 1) You haven't given us any performance numbers so it's not clear if the
>    locking is even a problem.
> 
> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
>    because now we take the rw_lock twice.  Fix your testing to catch
>    this kind of bug next time.
> 
> 3) Explain why it is safe to test zram->slot_free_rq when we are not
>    holding the lock.  I think it is unsafe.  I don't want to even think
>    about it without the numbers.
> 
> regards,
> dan carpenter
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 13:21     ` Dan Carpenter
  2013-09-09 13:46       ` Jerome Marchand
@ 2013-09-09 14:42       ` Sergey Senozhatsky
  2013-09-09 14:52         ` Dan Carpenter
  1 sibling, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-09 14:42 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

On (09/09/13 16:21), Dan Carpenter wrote:
> On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
> > > > Calling handle_pending_slot_free() for every RW operation may
> > > > cause unneccessary slot_free_lock locking, because most likely
> > > > process will see NULL slot_free_rq. handle_pending_slot_free()
> > > > only when current detects that slot_free_rq is not NULL.
> > > > 
> > > > v2: protect handle_pending_slot_free() with zram rw_lock.
> > > > 
> > > 
> > > zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
> > > rw_lock be wrapped around the whole operation like the original code
> > > does?  I don't know the zram code, but the original looks like it makes
> > > sense but in this one it looks like the locks are duplicative.
> > > 
> > > Is the down_read() in the original code be changed to down_write()?
> > >
> > 
> > I'm not touching locking around existing READ/WRITE commands.
> > 
> 
> Your patch does change the locking because now instead of taking the
> zram lock once it takes it and then drops it and then retakes it.  This
> looks potentially racy to me but I don't know the code so I will defer
> to any zram maintainer.
>

it takes it only when there is zram->slot_free_rq. otherwise it just
pointer comparison. original code does (schematically for READ case)

down_read()
	spin_lock()
		if (!NULL) {...}
	spin_unlock();
up_read();


patch adds the `if (!NULL)' check before N concurrent readers will
stuck on spin_lock() to just unlock spin_lock because zram->slot_free_rq
is NULL.

> 1) You haven't given us any performance numbers so it's not clear if the
>    locking is even a problem.

good point, I did not perform any testing. here they are:

 ./iozone -t -T -R -l 3 -u 3 -r 16K -s 60M -I +Z

	Record Size 16 KB
	File size set to 61440 KB
	O_DIRECT feature enabled
	Command line used: ./iozone -t -T -R -l 3 -u 3 -r 16K -s 60M -I +Z
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Min process = 3 
	Max process = 3 
	Throughput test with 3 processes
	Each process writes a 61440 Kbyte file in 16 Kbyte records


test             |    w/o patch   |    w/ patch
--------------------------------------------------
           Read  |   1895278.31   |   2778439.78
        Re-read  |   2638231.06   |   2630383.88
   Reverse Read  |   1378348.80   |   1538697.89
    Stride read  |   1698457.96   |   2043402.42
    Random read  |   1818998.33   |   2038670.38
 Mixed workload  |    376585.98   |    435064.57
          Pread  |   1402478.04   |    992575.22
          Fread  |   4955286.31   |   5199061.31

> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
>    because now we take the rw_lock twice.  Fix your testing to catch
>    this kind of bug next time.

yes it is. and I'm sorry about that. I sent v3 yesterday https://lkml.org/lkml/2013/9/8/42

> 3) Explain why it is safe to test zram->slot_free_rq when we are not
>    holding the lock.  I think it is unsafe.  I don't want to even think
>    about it without the numbers.

atomic pointer test, which is either NULL or !NULL.

for NULL case we don't take spin lock and just skip the whole
handle_pending_slot_free() thing.

for !NULL we call handle_pending_slot_free(). any operations with zram->slot_free_rq
(walking, removal or addition of element) are protected by spin lock within
handle_pending_slot_free(). most requests will see NULL zram->slot_free_rq.

if someone set zram->slot_free_rq to !NULL right after current process checked
it, then next request will see it !NULL and perform handle_pending_slot_free().

	-ss

> regards,
> dan carpenter
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 14:42       ` Sergey Senozhatsky
@ 2013-09-09 14:52         ` Dan Carpenter
  2013-09-09 15:09           ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Carpenter @ 2013-09-09 14:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

On Mon, Sep 09, 2013 at 05:42:59PM +0300, Sergey Senozhatsky wrote:
> > 3) Explain why it is safe to test zram->slot_free_rq when we are not
> >    holding the lock.  I think it is unsafe.  I don't want to even think
> >    about it without the numbers.
> 
> atomic pointer test, which is either NULL or !NULL.
> 

That's not how concurency works.  Atomic types are complicated than
that.  Anyway, the zram maintainers don't need me to explain that to
them so I'll let them take over from here.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 14:52         ` Dan Carpenter
@ 2013-09-09 15:09           ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-09 15:09 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Greg Kroah-Hartman, devel, Minchan Kim, Jerome Marchand, linux-kernel

On (09/09/13 17:52), Dan Carpenter wrote:
> On Mon, Sep 09, 2013 at 05:42:59PM +0300, Sergey Senozhatsky wrote:
> > > 3) Explain why it is safe to test zram->slot_free_rq when we are not
> > >    holding the lock.  I think it is unsafe.  I don't want to even think
> > >    about it without the numbers.
> > 
> > atomic pointer test, which is either NULL or !NULL.
> > 
> 
> That's not how concurency works.  Atomic types are complicated than
> that.  Anyway, the zram maintainers don't need me to explain that to
> them so I'll let them take over from here.
> 

yes, I understand that. but can't we check slot_free_rq pointer
(32 or 64 bit read) w/o locking to just decide if we must:
-- call handle_pending_slot_free()
-- take the slot_free_lock
-- check slot_free_rq again (this time under the slot_free_lock) and perform
   slot_free_rq operations while it is !NULL.

	-ss

> regards,
> dan carpenter
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 13:46       ` Jerome Marchand
@ 2013-09-09 16:10         ` Jerome Marchand
  2013-09-10 14:34           ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Jerome Marchand @ 2013-09-09 16:10 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sergey Senozhatsky, Greg Kroah-Hartman, devel, Minchan Kim, linux-kernel

On 09/09/2013 03:46 PM, Jerome Marchand wrote:
> On 09/09/2013 03:21 PM, Dan Carpenter wrote:
>> On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
>>>>> Calling handle_pending_slot_free() for every RW operation may
>>>>> cause unneccessary slot_free_lock locking, because most likely
>>>>> process will see NULL slot_free_rq. handle_pending_slot_free()
>>>>> only when current detects that slot_free_rq is not NULL.
>>>>>
>>>>> v2: protect handle_pending_slot_free() with zram rw_lock.
>>>>>
>>>>
>>>> zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
>>>> rw_lock be wrapped around the whole operation like the original code
>>>> does?  I don't know the zram code, but the original looks like it makes
>>>> sense but in this one it looks like the locks are duplicative.
>>>>
>>>> Is the down_read() in the original code be changed to down_write()?
>>>>
>>>
>>> I'm not touching locking around existing READ/WRITE commands.
>>>
>>
>> Your patch does change the locking because now instead of taking the
>> zram lock once it takes it and then drops it and then retakes it.  This
>> looks potentially racy to me but I don't know the code so I will defer
>> to any zram maintainer.
> 
> You're right. Nothing prevents zram_slot_free_notify() to repopulate the
> free slot queue while we drop the lock.
> 
> Actually, the original code is already racy. handle_pending_slot_free()
> modifies zram->table while holding only a read lock. It needs to hold a
> write lock to do that. Using down_write for all requests would obviously
> fix that, but at the cost of read performance.

Now I think we can drop the call to handle_pending_slot_free() in
zram_bvec_rw() altogether. As long as the write lock is held when
handle_pending_slot_free() is called, there is no race. It's no different
from any write request and the current code handles R/W concurrency
already.

Jerome

> 
>>
>> 1) You haven't given us any performance numbers so it's not clear if the
>>    locking is even a problem.
>>
>> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
>>    because now we take the rw_lock twice.  Fix your testing to catch
>>    this kind of bug next time.
>>
>> 3) Explain why it is safe to test zram->slot_free_rq when we are not
>>    holding the lock.  I think it is unsafe.  I don't want to even think
>>    about it without the numbers.
>>
>> regards,
>> dan carpenter
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-09 16:10         ` Jerome Marchand
@ 2013-09-10 14:34           ` Sergey Senozhatsky
  2013-09-10 14:58             ` Dan Carpenter
  2013-09-10 23:27             ` [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
  0 siblings, 2 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 14:34 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: Dan Carpenter, Greg Kroah-Hartman, devel, Minchan Kim, linux-kernel

On (09/09/13 18:10), Jerome Marchand wrote:
> On 09/09/2013 03:46 PM, Jerome Marchand wrote:
> > On 09/09/2013 03:21 PM, Dan Carpenter wrote:
> >> On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
> >>>>> Calling handle_pending_slot_free() for every RW operation may
> >>>>> cause unneccessary slot_free_lock locking, because most likely
> >>>>> process will see NULL slot_free_rq. handle_pending_slot_free()
> >>>>> only when current detects that slot_free_rq is not NULL.
> >>>>>
> >>>>> v2: protect handle_pending_slot_free() with zram rw_lock.
> >>>>>
> >>>>
> >>>> zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
> >>>> rw_lock be wrapped around the whole operation like the original code
> >>>> does?  I don't know the zram code, but the original looks like it makes
> >>>> sense but in this one it looks like the locks are duplicative.
> >>>>
> >>>> Is the down_read() in the original code be changed to down_write()?
> >>>>
> >>>
> >>> I'm not touching locking around existing READ/WRITE commands.
> >>>
> >>
> >> Your patch does change the locking because now instead of taking the
> >> zram lock once it takes it and then drops it and then retakes it.  This
> >> looks potentially racy to me but I don't know the code so I will defer
> >> to any zram maintainer.
> > 
> > You're right. Nothing prevents zram_slot_free_notify() to repopulate the
> > free slot queue while we drop the lock.
> > 
> > Actually, the original code is already racy. handle_pending_slot_free()
> > modifies zram->table while holding only a read lock. It needs to hold a
> > write lock to do that. Using down_write for all requests would obviously
> > fix that, but at the cost of read performance.
> 
> Now I think we can drop the call to handle_pending_slot_free() in
> zram_bvec_rw() altogether. As long as the write lock is held when
> handle_pending_slot_free() is called, there is no race. It's no different
> from any write request and the current code handles R/W concurrency
> already.

Yes, I think that can work. 

To summarize, there should be 3 patches:
1) handle_pending_slot_free() in zram_bvec_rw() (as suggested by Jerome Marchand)
2) handle_pending_slot_free() race with reset (found by Dan Carpenter)
3) drop init_done and use init_done()

I'll prepare a patches later today.

	-ss

> Jerome
> 
> > 
> >>
> >> 1) You haven't given us any performance numbers so it's not clear if the
> >>    locking is even a problem.
> >>
> >> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
> >>    because now we take the rw_lock twice.  Fix your testing to catch
> >>    this kind of bug next time.
> >>
> >> 3) Explain why it is safe to test zram->slot_free_rq when we are not
> >>    holding the lock.  I think it is unsafe.  I don't want to even think
> >>    about it without the numbers.
> >>
> >> regards,
> >> dan carpenter
> >>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-10 14:34           ` Sergey Senozhatsky
@ 2013-09-10 14:58             ` Dan Carpenter
  2013-09-10 15:15               ` Greg Kroah-Hartman
                                 ` (2 more replies)
  2013-09-10 23:27             ` [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
  1 sibling, 3 replies; 27+ messages in thread
From: Dan Carpenter @ 2013-09-10 14:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jerome Marchand, Greg Kroah-Hartman, devel, Minchan Kim, linux-kernel

Btw, the devel@driverdev.osuosl.org list seems to be down again.  I
still have not recieved the v3 patch.  Use the
driverdev-devel@linuxdriverproject.org email list instead.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-10 14:58             ` Dan Carpenter
@ 2013-09-10 15:15               ` Greg Kroah-Hartman
  2013-09-10 23:12                 ` Sergey Senozhatsky
  2013-09-10 23:19                 ` Sergey Senozhatsky
  2 siblings, 0 replies; 27+ messages in thread
From: Greg Kroah-Hartman @ 2013-09-10 15:15 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sergey Senozhatsky, Jerome Marchand, devel, Minchan Kim, linux-kernel

On Tue, Sep 10, 2013 at 05:58:02PM +0300, Dan Carpenter wrote:
> Btw, the devel@driverdev.osuosl.org list seems to be down again.  I
> still have not recieved the v3 patch.  Use the
> driverdev-devel@linuxdriverproject.org email list instead.

They are the same "list", just different DNS entries.  I'll go poke the
sysadmin to find out what's going on...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-10 14:58             ` Dan Carpenter
@ 2013-09-10 23:12                 ` Sergey Senozhatsky
  2013-09-10 23:12                 ` Sergey Senozhatsky
  2013-09-10 23:19                 ` Sergey Senozhatsky
  2 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 23:12 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Jerome Marchand, driverdev-devel, Minchan Kim, linux-kernel

Dan Carpenter noted that handle_pending_slot_free() is racy with
zram_reset_device(). Take write init_lock in zram_slot_free(), thus
preventing any concurrent zram_slot_free(), zram_bvec_rw() or
zram_reset_device(). This also allows to safely check zram->init_done
in handle_pending_slot_free().

Initial intention was to minimze number of handle_pending_slot_free()
call from zram_bvec_rw(), which were slowing down READ requests due to
slot_free_lock spin lock. Jerome Marchand suggested to remove
handle_pending_slot_free() from zram_bvec_rw().

Link: https://lkml.org/lkml/2013/9/9/172
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 drivers/staging/zram/zram_drv.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 91d94b5..7a2d4de 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
 	while (zram->slot_free_rq) {
 		free_rq = zram->slot_free_rq;
 		zram->slot_free_rq = free_rq->next;
-		zram_free_page(zram, free_rq->index);
+		if (zram->init_done)
+			zram_free_page(zram, free_rq->index);
 		kfree(free_rq);
 	}
 	spin_unlock(&zram->slot_free_lock);
@@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if (rw == READ) {
 		down_read(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
 	}
-
 	return ret;
 }
 
@@ -750,12 +748,11 @@ error:
 
 static void zram_slot_free(struct work_struct *work)
 {
-	struct zram *zram;
+	struct zram *zram = container_of(work, struct zram, free_work);
 
-	zram = container_of(work, struct zram, free_work);
-	down_write(&zram->lock);
+	down_write(&zram->init_lock);
 	handle_pending_slot_free(zram);
-	up_write(&zram->lock);
+	up_write(&zram->init_lock);
 }
 
 static void add_slot_free(struct zram *zram, struct zram_slot_free *free_rq)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
@ 2013-09-10 23:12                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 23:12 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Minchan Kim, driverdev-devel, Jerome Marchand, linux-kernel

Dan Carpenter noted that handle_pending_slot_free() is racy with
zram_reset_device(). Take write init_lock in zram_slot_free(), thus
preventing any concurrent zram_slot_free(), zram_bvec_rw() or
zram_reset_device(). This also allows to safely check zram->init_done
in handle_pending_slot_free().

Initial intention was to minimze number of handle_pending_slot_free()
call from zram_bvec_rw(), which were slowing down READ requests due to
slot_free_lock spin lock. Jerome Marchand suggested to remove
handle_pending_slot_free() from zram_bvec_rw().

Link: https://lkml.org/lkml/2013/9/9/172
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 drivers/staging/zram/zram_drv.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 91d94b5..7a2d4de 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
 	while (zram->slot_free_rq) {
 		free_rq = zram->slot_free_rq;
 		zram->slot_free_rq = free_rq->next;
-		zram_free_page(zram, free_rq->index);
+		if (zram->init_done)
+			zram_free_page(zram, free_rq->index);
 		kfree(free_rq);
 	}
 	spin_unlock(&zram->slot_free_lock);
@@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if (rw == READ) {
 		down_read(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
 	}
-
 	return ret;
 }
 
@@ -750,12 +748,11 @@ error:
 
 static void zram_slot_free(struct work_struct *work)
 {
-	struct zram *zram;
+	struct zram *zram = container_of(work, struct zram, free_work);
 
-	zram = container_of(work, struct zram, free_work);
-	down_write(&zram->lock);
+	down_write(&zram->init_lock);
 	handle_pending_slot_free(zram);
-	up_write(&zram->lock);
+	up_write(&zram->init_lock);
 }
 
 static void add_slot_free(struct zram *zram, struct zram_slot_free *free_rq)

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] staging: zram: remove init_done from zram struct (v3)
  2013-09-10 14:58             ` Dan Carpenter
@ 2013-09-10 23:19                 ` Sergey Senozhatsky
  2013-09-10 23:12                 ` Sergey Senozhatsky
  2013-09-10 23:19                 ` Sergey Senozhatsky
  2 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 23:19 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Jerome Marchand, driverdev-devel, Minchan Kim, linux-kernel

`zram->init_done' in fact mimics `zram->meta != NULL' value.
Introduce init_done() function that checks zram->meta (iow,
checks if initialisation was performed), so `zram->init_done'
can be removed.

v3: init_done() in handle_pending_slot_free()
v2: introduce init_done()

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 drivers/staging/zram/zram_drv.c | 23 ++++++++++++-----------
 drivers/staging/zram/zram_drv.h |  1 -
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 7a2d4de..dcfe07c 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,6 +42,11 @@ static struct zram *zram_devices;
 /* Module params (documentation at end) */
 static unsigned int num_devices = 1;
 
+static inline int init_done(struct zram *zram)
+{
+	return zram->meta != NULL;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
 	return (struct zram *)dev_to_disk(dev)->private_data;
@@ -60,7 +65,7 @@ static ssize_t initstate_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->init_done);
+	return sprintf(buf, "%u\n", init_done(zram));
 }
 
 static ssize_t num_reads_show(struct device *dev,
@@ -133,7 +138,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 	struct zram_meta *meta = zram->meta;
 
 	down_read(&zram->init_lock);
-	if (zram->init_done)
+	if (init_done(zram))
 		val = zs_get_total_size_bytes(meta->mem_pool);
 	up_read(&zram->init_lock);
 
@@ -521,7 +526,7 @@ static void handle_pending_slot_free(struct zram *zram)
 	while (zram->slot_free_rq) {
 		free_rq = zram->slot_free_rq;
 		zram->slot_free_rq = free_rq->next;
-		if (zram->init_done)
+		if (init_done(zram))
 			zram_free_page(zram, free_rq->index);
 		kfree(free_rq);
 	}
@@ -553,14 +558,12 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	flush_work(&zram->free_work);
 
 	down_write(&zram->init_lock);
-	if (!zram->init_done) {
+	if (!init_done(zram)) {
 		up_write(&zram->init_lock);
 		return;
 	}
 
 	meta = zram->meta;
-	zram->init_done = 0;
-
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
 		unsigned long handle = meta->table[index].handle;
@@ -599,9 +602,7 @@ static void zram_init_device(struct zram *zram, struct zram_meta *meta)
 
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
-
 	zram->meta = meta;
-	zram->init_done = 1;
 
 	pr_debug("Initialization done!\n");
 }
@@ -620,7 +621,7 @@ static ssize_t disksize_store(struct device *dev,
 	disksize = PAGE_ALIGN(disksize);
 	meta = zram_meta_alloc(disksize);
 	down_write(&zram->init_lock);
-	if (zram->init_done) {
+	if (init_done(zram)) {
 		up_write(&zram->init_lock);
 		zram_meta_free(meta);
 		pr_info("Cannot change disksize for initialized device\n");
@@ -728,7 +729,7 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 	struct zram *zram = queue->queuedata;
 
 	down_read(&zram->init_lock);
-	if (unlikely(!zram->init_done))
+	if (unlikely(!init_done(zram)))
 		goto error;
 
 	if (!valid_io_request(zram, bio)) {
@@ -876,7 +877,7 @@ static int create_device(struct zram *zram, int device_id)
 		goto out_free_disk;
 	}
 
-	zram->init_done = 0;
+	zram->meta = NULL;
 	return 0;
 
 out_free_disk:
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 97a3acf..b1100cf 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -110,7 +110,6 @@ struct zram {
 
 	struct request_queue *queue;
 	struct gendisk *disk;
-	int init_done;
 	/* Prevent concurrent execution of device init, reset and R/W request */
 	struct rw_semaphore init_lock;
 	/*


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] staging: zram: remove init_done from zram struct (v3)
@ 2013-09-10 23:19                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 23:19 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Minchan Kim, driverdev-devel, Jerome Marchand, linux-kernel

`zram->init_done' in fact mimics `zram->meta != NULL' value.
Introduce init_done() function that checks zram->meta (iow,
checks if initialisation was performed), so `zram->init_done'
can be removed.

v3: init_done() in handle_pending_slot_free()
v2: introduce init_done()

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 drivers/staging/zram/zram_drv.c | 23 ++++++++++++-----------
 drivers/staging/zram/zram_drv.h |  1 -
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 7a2d4de..dcfe07c 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,6 +42,11 @@ static struct zram *zram_devices;
 /* Module params (documentation at end) */
 static unsigned int num_devices = 1;
 
+static inline int init_done(struct zram *zram)
+{
+	return zram->meta != NULL;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
 	return (struct zram *)dev_to_disk(dev)->private_data;
@@ -60,7 +65,7 @@ static ssize_t initstate_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->init_done);
+	return sprintf(buf, "%u\n", init_done(zram));
 }
 
 static ssize_t num_reads_show(struct device *dev,
@@ -133,7 +138,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 	struct zram_meta *meta = zram->meta;
 
 	down_read(&zram->init_lock);
-	if (zram->init_done)
+	if (init_done(zram))
 		val = zs_get_total_size_bytes(meta->mem_pool);
 	up_read(&zram->init_lock);
 
@@ -521,7 +526,7 @@ static void handle_pending_slot_free(struct zram *zram)
 	while (zram->slot_free_rq) {
 		free_rq = zram->slot_free_rq;
 		zram->slot_free_rq = free_rq->next;
-		if (zram->init_done)
+		if (init_done(zram))
 			zram_free_page(zram, free_rq->index);
 		kfree(free_rq);
 	}
@@ -553,14 +558,12 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	flush_work(&zram->free_work);
 
 	down_write(&zram->init_lock);
-	if (!zram->init_done) {
+	if (!init_done(zram)) {
 		up_write(&zram->init_lock);
 		return;
 	}
 
 	meta = zram->meta;
-	zram->init_done = 0;
-
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
 		unsigned long handle = meta->table[index].handle;
@@ -599,9 +602,7 @@ static void zram_init_device(struct zram *zram, struct zram_meta *meta)
 
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
-
 	zram->meta = meta;
-	zram->init_done = 1;
 
 	pr_debug("Initialization done!\n");
 }
@@ -620,7 +621,7 @@ static ssize_t disksize_store(struct device *dev,
 	disksize = PAGE_ALIGN(disksize);
 	meta = zram_meta_alloc(disksize);
 	down_write(&zram->init_lock);
-	if (zram->init_done) {
+	if (init_done(zram)) {
 		up_write(&zram->init_lock);
 		zram_meta_free(meta);
 		pr_info("Cannot change disksize for initialized device\n");
@@ -728,7 +729,7 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio)
 	struct zram *zram = queue->queuedata;
 
 	down_read(&zram->init_lock);
-	if (unlikely(!zram->init_done))
+	if (unlikely(!init_done(zram)))
 		goto error;
 
 	if (!valid_io_request(zram, bio)) {
@@ -876,7 +877,7 @@ static int create_device(struct zram *zram, int device_id)
 		goto out_free_disk;
 	}
 
-	zram->init_done = 0;
+	zram->meta = NULL;
 	return 0;
 
 out_free_disk:
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 97a3acf..b1100cf 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -110,7 +110,6 @@ struct zram {
 
 	struct request_queue *queue;
 	struct gendisk *disk;
-	int init_done;
 	/* Prevent concurrent execution of device init, reset and R/W request */
 	struct rw_semaphore init_lock;
 	/*

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2)
  2013-09-10 14:34           ` Sergey Senozhatsky
  2013-09-10 14:58             ` Dan Carpenter
@ 2013-09-10 23:27             ` Sergey Senozhatsky
  1 sibling, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-10 23:27 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Jerome Marchand, Greg Kroah-Hartman, driverdev-devel,
	Minchan Kim, linux-kernel

On (09/10/13 17:34), Sergey Senozhatsky wrote:
[..]
> > 
> > Now I think we can drop the call to handle_pending_slot_free() in
> > zram_bvec_rw() altogether. As long as the write lock is held when
> > handle_pending_slot_free() is called, there is no race. It's no different
> > from any write request and the current code handles R/W concurrency
> > already.
> 
> Yes, I think that can work. 
> 
> To summarize, there should be 3 patches:
> 1) handle_pending_slot_free() in zram_bvec_rw() (as suggested by Jerome Marchand)
> 2) handle_pending_slot_free() race with reset (found by Dan Carpenter)
> 3) drop init_done and use init_done()
> 
> I'll prepare a patches later today.

I've sent two patches:
 staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
 staging: zram: remove init_done from zram struct (v3)

Cc'd driverdev-devel@linuxdriverproject.org as suggested by Dan.

please discard any previous patches and sorry for the noise.

Thanks,
	-ss

> 
> > Jerome
> > 
> > > 
> > >>
> > >> 1) You haven't given us any performance numbers so it's not clear if the
> > >>    locking is even a problem.
> > >>
> > >> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
> > >>    because now we take the rw_lock twice.  Fix your testing to catch
> > >>    this kind of bug next time.
> > >>
> > >> 3) Explain why it is safe to test zram->slot_free_rq when we are not
> > >>    holding the lock.  I think it is unsafe.  I don't want to even think
> > >>    about it without the numbers.
> > >>
> > >> regards,
> > >> dan carpenter
> > >>
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-10 23:12                 ` Sergey Senozhatsky
  (?)
@ 2013-09-12 22:12                 ` Greg KH
  2013-09-13  9:17                   ` Sergey Senozhatsky
  -1 siblings, 1 reply; 27+ messages in thread
From: Greg KH @ 2013-09-12 22:12 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Dan Carpenter, Minchan Kim, driverdev-devel, Jerome Marchand,
	linux-kernel

On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> Dan Carpenter noted that handle_pending_slot_free() is racy with
> zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> zram_reset_device(). This also allows to safely check zram->init_done
> in handle_pending_slot_free().
> 
> Initial intention was to minimze number of handle_pending_slot_free()
> call from zram_bvec_rw(), which were slowing down READ requests due to
> slot_free_lock spin lock. Jerome Marchand suggested to remove
> handle_pending_slot_free() from zram_bvec_rw().
> 
> Link: https://lkml.org/lkml/2013/9/9/172
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

I have multiple versions of this and the other zram patches from you,
with no idea of which to accept.

So, I'm going to drop them all, can you please resend what you wish to
submit, and in the future, be a bit more obvious with your "vN"
markings?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-12 22:12                 ` Greg KH
@ 2013-09-13  9:17                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-13  9:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Dan Carpenter, Minchan Kim, driverdev-devel, Jerome Marchand,
	linux-kernel

On (09/12/13 15:12), Greg KH wrote:
> On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > zram_reset_device(). This also allows to safely check zram->init_done
> > in handle_pending_slot_free().
> > 
> > Initial intention was to minimze number of handle_pending_slot_free()
> > call from zram_bvec_rw(), which were slowing down READ requests due to
> > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > handle_pending_slot_free() from zram_bvec_rw().
> > 
> > Link: https://lkml.org/lkml/2013/9/9/172
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> I have multiple versions of this and the other zram patches from you,
> with no idea of which to accept.

yes, please, drop all patches. I did not Cc you in these two
patches to stop spamming your inbox with multiply versions.
I will send them back to you as soon as I get positive feedback.


> So, I'm going to drop them all, can you please resend what you wish to
> submit, and in the future, be a bit more obvious with your "vN"
> markings?
>

sorry for that.

	-ss

> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-10 23:12                 ` Sergey Senozhatsky
@ 2013-09-16  0:02                   ` Minchan Kim
  -1 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2013-09-16  0:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Dan Carpenter, Jerome Marchand, driverdev-devel, linux-kernel

Hello Sergey,

Sorry for really slow response. I was really busy by internal works
and Thanks for pointing the BUG, Dan, Jerome and Sergey.
I read your threads roughly so I may miss something. If so, sorry
for that. Anyway I will put my opinion.

On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> Dan Carpenter noted that handle_pending_slot_free() is racy with
> zram_reset_device(). Take write init_lock in zram_slot_free(), thus

Right but "init_lock" is what I really want to remove.
Yes. It's just read-side lock so most of time it doesn't hurt us but it
makes code very complicated and deadlock prone so I'd like to replace
it with RCU. Yeah, It's off topic but just let me put my opinion in
future direction.

Abought the bug, how about moving flush_work below down_write(init_lock)?
zram_make_request is already closed by init_lock and we have a rule about
lock ordering as following so I don't see any problem.

  init_lock
    zram->lock

> preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> zram_reset_device(). This also allows to safely check zram->init_done
> in handle_pending_slot_free().
> 
> Initial intention was to minimze number of handle_pending_slot_free()
> call from zram_bvec_rw(), which were slowing down READ requests due to
> slot_free_lock spin lock. Jerome Marchand suggested to remove
> handle_pending_slot_free() from zram_bvec_rw().
> 
> Link: https://lkml.org/lkml/2013/9/9/172
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> ---
> 
>  drivers/staging/zram/zram_drv.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 91d94b5..7a2d4de 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
>  	while (zram->slot_free_rq) {
>  		free_rq = zram->slot_free_rq;
>  		zram->slot_free_rq = free_rq->next;
> -		zram_free_page(zram, free_rq->index);
> +		if (zram->init_done)
> +			zram_free_page(zram, free_rq->index);
>  		kfree(free_rq);
>  	}
>  	spin_unlock(&zram->slot_free_lock);
> @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
>  
>  	if (rw == READ) {
>  		down_read(&zram->lock);
> -		handle_pending_slot_free(zram);

Read side is okay but actually I have a nitpick.
If someone poll a block in zram-swap device, he would see a block
has zero value suddenly although there was no I/O.(I don't want to argue
it's sane user or not, anyway) it never happens on real block device and
it never happens on zram-block device. Only it can happen zram-swap device.
And such behavior was there since we introduced swap_slot_free_notify.
(off-topic: I'd like to remove it because it makes tight coupling between
zram and swap and obviously, it was layering violation function)
so now, I don't have strong objection. 

The idea is to remove swap_slot_free_notify is to use frontswap when
user want to use zram as swap so zram can be notified when the block
lose the owner but still we should solve the mutex problem in notify
handler.


>  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
>  		up_read(&zram->lock);
>  	} else {
>  		down_write(&zram->lock);
> -		handle_pending_slot_free(zram);

Why did you remove this in write-side?
We can't expect when the work will trigger. It means the work could remove
valid block under the us.


>  		ret = zram_bvec_write(zram, bvec, index, offset);
>  		up_write(&zram->lock);
>  	}
> -
>  	return ret;
>  }
>  
> @@ -750,12 +748,11 @@ error:
>  
>  static void zram_slot_free(struct work_struct *work)
>  {
> -	struct zram *zram;
> +	struct zram *zram = container_of(work, struct zram, free_work);
>  
> -	zram = container_of(work, struct zram, free_work);
> -	down_write(&zram->lock);
> +	down_write(&zram->init_lock);

I don't like this.
Primary problem is we should handle it as atomic so that we should use
spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
I should solve this problem as that way.

The simple solution popped from my mind is that


diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 91d94b5..b23bf0e 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if (rw == READ) {
 		down_read(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
+		/*
+		 * We should free pending slot. Otherwise it would
+		 * free valid blocks under the us.
+		 */
 		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
@@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	size_t index;
 	struct zram_meta *meta;
 
-	flush_work(&zram->free_work);
 
 	down_write(&zram->init_lock);
 	if (!zram->init_done) {
@@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
+	flush_work(&zram->free_work);
 	meta = zram->meta;
 	zram->init_done = 0;
 
 But more ideal way I am thinking now is 

1) replace init_lock with RCU lock
2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
3) use atmoic lock in notify handler.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
@ 2013-09-16  0:02                   ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2013-09-16  0:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: driverdev-devel, Jerome Marchand, linux-kernel, Dan Carpenter

Hello Sergey,

Sorry for really slow response. I was really busy by internal works
and Thanks for pointing the BUG, Dan, Jerome and Sergey.
I read your threads roughly so I may miss something. If so, sorry
for that. Anyway I will put my opinion.

On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> Dan Carpenter noted that handle_pending_slot_free() is racy with
> zram_reset_device(). Take write init_lock in zram_slot_free(), thus

Right but "init_lock" is what I really want to remove.
Yes. It's just read-side lock so most of time it doesn't hurt us but it
makes code very complicated and deadlock prone so I'd like to replace
it with RCU. Yeah, It's off topic but just let me put my opinion in
future direction.

Abought the bug, how about moving flush_work below down_write(init_lock)?
zram_make_request is already closed by init_lock and we have a rule about
lock ordering as following so I don't see any problem.

  init_lock
    zram->lock

> preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> zram_reset_device(). This also allows to safely check zram->init_done
> in handle_pending_slot_free().
> 
> Initial intention was to minimze number of handle_pending_slot_free()
> call from zram_bvec_rw(), which were slowing down READ requests due to
> slot_free_lock spin lock. Jerome Marchand suggested to remove
> handle_pending_slot_free() from zram_bvec_rw().
> 
> Link: https://lkml.org/lkml/2013/9/9/172
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> ---
> 
>  drivers/staging/zram/zram_drv.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 91d94b5..7a2d4de 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
>  	while (zram->slot_free_rq) {
>  		free_rq = zram->slot_free_rq;
>  		zram->slot_free_rq = free_rq->next;
> -		zram_free_page(zram, free_rq->index);
> +		if (zram->init_done)
> +			zram_free_page(zram, free_rq->index);
>  		kfree(free_rq);
>  	}
>  	spin_unlock(&zram->slot_free_lock);
> @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
>  
>  	if (rw == READ) {
>  		down_read(&zram->lock);
> -		handle_pending_slot_free(zram);

Read side is okay but actually I have a nitpick.
If someone poll a block in zram-swap device, he would see a block
has zero value suddenly although there was no I/O.(I don't want to argue
it's sane user or not, anyway) it never happens on real block device and
it never happens on zram-block device. Only it can happen zram-swap device.
And such behavior was there since we introduced swap_slot_free_notify.
(off-topic: I'd like to remove it because it makes tight coupling between
zram and swap and obviously, it was layering violation function)
so now, I don't have strong objection. 

The idea is to remove swap_slot_free_notify is to use frontswap when
user want to use zram as swap so zram can be notified when the block
lose the owner but still we should solve the mutex problem in notify
handler.


>  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
>  		up_read(&zram->lock);
>  	} else {
>  		down_write(&zram->lock);
> -		handle_pending_slot_free(zram);

Why did you remove this in write-side?
We can't expect when the work will trigger. It means the work could remove
valid block under the us.


>  		ret = zram_bvec_write(zram, bvec, index, offset);
>  		up_write(&zram->lock);
>  	}
> -
>  	return ret;
>  }
>  
> @@ -750,12 +748,11 @@ error:
>  
>  static void zram_slot_free(struct work_struct *work)
>  {
> -	struct zram *zram;
> +	struct zram *zram = container_of(work, struct zram, free_work);
>  
> -	zram = container_of(work, struct zram, free_work);
> -	down_write(&zram->lock);
> +	down_write(&zram->init_lock);

I don't like this.
Primary problem is we should handle it as atomic so that we should use
spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
I should solve this problem as that way.

The simple solution popped from my mind is that


diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 91d94b5..b23bf0e 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if (rw == READ) {
 		down_read(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
+		/*
+		 * We should free pending slot. Otherwise it would
+		 * free valid blocks under the us.
+		 */
 		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
@@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	size_t index;
 	struct zram_meta *meta;
 
-	flush_work(&zram->free_work);
 
 	down_write(&zram->init_lock);
 	if (!zram->init_done) {
@@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
+	flush_work(&zram->free_work);
 	meta = zram->meta;
 	zram->init_done = 0;
 
 But more ideal way I am thinking now is 

1) replace init_lock with RCU lock
2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
3) use atmoic lock in notify handler.

-- 
Kind regards,
Minchan Kim
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-16  0:02                   ` Minchan Kim
@ 2013-09-17 17:24                     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-17 17:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Dan Carpenter, Jerome Marchand, driverdev-devel, linux-kernel


Hello,

On (09/16/13 09:02), Minchan Kim wrote:
> Hello Sergey,
> 
> Sorry for really slow response. I was really busy by internal works
> and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> I read your threads roughly so I may miss something. If so, sorry
> for that. Anyway I will put my opinion.
> 
> On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> 
> Right but "init_lock" is what I really want to remove.
> Yes. It's just read-side lock so most of time it doesn't hurt us but it
> makes code very complicated and deadlock prone so I'd like to replace
> it with RCU. Yeah, It's off topic but just let me put my opinion in
> future direction.
> 
> Abought the bug, how about moving flush_work below down_write(init_lock)?
> zram_make_request is already closed by init_lock and we have a rule about
> lock ordering as following so I don't see any problem.
> 
>   init_lock
>     zram->lock
> 
> > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > zram_reset_device(). This also allows to safely check zram->init_done
> > in handle_pending_slot_free().
> > 
> > Initial intention was to minimze number of handle_pending_slot_free()
> > call from zram_bvec_rw(), which were slowing down READ requests due to
> > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > handle_pending_slot_free() from zram_bvec_rw().
> > 
> > Link: https://lkml.org/lkml/2013/9/9/172
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > 
> > ---
> > 
> >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > index 91d94b5..7a2d4de 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> >  	while (zram->slot_free_rq) {
> >  		free_rq = zram->slot_free_rq;
> >  		zram->slot_free_rq = free_rq->next;
> > -		zram_free_page(zram, free_rq->index);
> > +		if (zram->init_done)
> > +			zram_free_page(zram, free_rq->index);
> >  		kfree(free_rq);
> >  	}
> >  	spin_unlock(&zram->slot_free_lock);
> > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  
> >  	if (rw == READ) {
> >  		down_read(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Read side is okay but actually I have a nitpick.
> If someone poll a block in zram-swap device, he would see a block
> has zero value suddenly although there was no I/O.(I don't want to argue
> it's sane user or not, anyway) it never happens on real block device and
> it never happens on zram-block device. Only it can happen zram-swap device.
> And such behavior was there since we introduced swap_slot_free_notify.
> (off-topic: I'd like to remove it because it makes tight coupling between
> zram and swap and obviously, it was layering violation function)
> so now, I don't have strong objection. 
> 
> The idea is to remove swap_slot_free_notify is to use frontswap when
> user want to use zram as swap so zram can be notified when the block
> lose the owner but still we should solve the mutex problem in notify
> handler.
> 
> 
> >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> >  		up_read(&zram->lock);
> >  	} else {
> >  		down_write(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Why did you remove this in write-side?
> We can't expect when the work will trigger. It means the work could remove
> valid block under the us.
> 


not sure I understand how.
zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
any RW requests. RW requests are under read() lock and zram_slot_free() is under
write() lock.

> >  		ret = zram_bvec_write(zram, bvec, index, offset);
> >  		up_write(&zram->lock);
> >  	}
> > -
> >  	return ret;
> >  }
> >  
> > @@ -750,12 +748,11 @@ error:
> >  
> >  static void zram_slot_free(struct work_struct *work)
> >  {
> > -	struct zram *zram;
> > +	struct zram *zram = container_of(work, struct zram, free_work);
> >  
> > -	zram = container_of(work, struct zram, free_work);
> > -	down_write(&zram->lock);
> > +	down_write(&zram->init_lock);
> 
> I don't like this.
> Primary problem is we should handle it as atomic so that we should use
> spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> I should solve this problem as that way.
> 
> The simple solution popped from my mind is that
> 
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 91d94b5..b23bf0e 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
>  
>  	if (rw == READ) {
>  		down_read(&zram->lock);
> -		handle_pending_slot_free(zram);
>  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
>  		up_read(&zram->lock);
>  	} else {
>  		down_write(&zram->lock);
> +		/*
> +		 * We should free pending slot. Otherwise it would
> +		 * free valid blocks under the us.
> +		 */
>  		handle_pending_slot_free(zram);
>  		ret = zram_bvec_write(zram, bvec, index, offset);
>  		up_write(&zram->lock);
> @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	size_t index;
>  	struct zram_meta *meta;
>  
> -	flush_work(&zram->free_work);
>  
>  	down_write(&zram->init_lock);
>  	if (!zram->init_done) {
> @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> +	flush_work(&zram->free_work);
>  	meta = zram->meta;
>  	zram->init_done = 0;

this one looks ok to me.

	-ss

>  But more ideal way I am thinking now is 
> 
> 1) replace init_lock with RCU lock
> 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> 3) use atmoic lock in notify handler.
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
@ 2013-09-17 17:24                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-17 17:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: driverdev-devel, Jerome Marchand, linux-kernel, Dan Carpenter


Hello,

On (09/16/13 09:02), Minchan Kim wrote:
> Hello Sergey,
> 
> Sorry for really slow response. I was really busy by internal works
> and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> I read your threads roughly so I may miss something. If so, sorry
> for that. Anyway I will put my opinion.
> 
> On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> 
> Right but "init_lock" is what I really want to remove.
> Yes. It's just read-side lock so most of time it doesn't hurt us but it
> makes code very complicated and deadlock prone so I'd like to replace
> it with RCU. Yeah, It's off topic but just let me put my opinion in
> future direction.
> 
> Abought the bug, how about moving flush_work below down_write(init_lock)?
> zram_make_request is already closed by init_lock and we have a rule about
> lock ordering as following so I don't see any problem.
> 
>   init_lock
>     zram->lock
> 
> > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > zram_reset_device(). This also allows to safely check zram->init_done
> > in handle_pending_slot_free().
> > 
> > Initial intention was to minimze number of handle_pending_slot_free()
> > call from zram_bvec_rw(), which were slowing down READ requests due to
> > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > handle_pending_slot_free() from zram_bvec_rw().
> > 
> > Link: https://lkml.org/lkml/2013/9/9/172
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > 
> > ---
> > 
> >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > index 91d94b5..7a2d4de 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> >  	while (zram->slot_free_rq) {
> >  		free_rq = zram->slot_free_rq;
> >  		zram->slot_free_rq = free_rq->next;
> > -		zram_free_page(zram, free_rq->index);
> > +		if (zram->init_done)
> > +			zram_free_page(zram, free_rq->index);
> >  		kfree(free_rq);
> >  	}
> >  	spin_unlock(&zram->slot_free_lock);
> > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  
> >  	if (rw == READ) {
> >  		down_read(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Read side is okay but actually I have a nitpick.
> If someone poll a block in zram-swap device, he would see a block
> has zero value suddenly although there was no I/O.(I don't want to argue
> it's sane user or not, anyway) it never happens on real block device and
> it never happens on zram-block device. Only it can happen zram-swap device.
> And such behavior was there since we introduced swap_slot_free_notify.
> (off-topic: I'd like to remove it because it makes tight coupling between
> zram and swap and obviously, it was layering violation function)
> so now, I don't have strong objection. 
> 
> The idea is to remove swap_slot_free_notify is to use frontswap when
> user want to use zram as swap so zram can be notified when the block
> lose the owner but still we should solve the mutex problem in notify
> handler.
> 
> 
> >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> >  		up_read(&zram->lock);
> >  	} else {
> >  		down_write(&zram->lock);
> > -		handle_pending_slot_free(zram);
> 
> Why did you remove this in write-side?
> We can't expect when the work will trigger. It means the work could remove
> valid block under the us.
> 


not sure I understand how.
zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
any RW requests. RW requests are under read() lock and zram_slot_free() is under
write() lock.

> >  		ret = zram_bvec_write(zram, bvec, index, offset);
> >  		up_write(&zram->lock);
> >  	}
> > -
> >  	return ret;
> >  }
> >  
> > @@ -750,12 +748,11 @@ error:
> >  
> >  static void zram_slot_free(struct work_struct *work)
> >  {
> > -	struct zram *zram;
> > +	struct zram *zram = container_of(work, struct zram, free_work);
> >  
> > -	zram = container_of(work, struct zram, free_work);
> > -	down_write(&zram->lock);
> > +	down_write(&zram->init_lock);
> 
> I don't like this.
> Primary problem is we should handle it as atomic so that we should use
> spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> I should solve this problem as that way.
> 
> The simple solution popped from my mind is that
> 
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 91d94b5..b23bf0e 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
>  
>  	if (rw == READ) {
>  		down_read(&zram->lock);
> -		handle_pending_slot_free(zram);
>  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
>  		up_read(&zram->lock);
>  	} else {
>  		down_write(&zram->lock);
> +		/*
> +		 * We should free pending slot. Otherwise it would
> +		 * free valid blocks under the us.
> +		 */
>  		handle_pending_slot_free(zram);
>  		ret = zram_bvec_write(zram, bvec, index, offset);
>  		up_write(&zram->lock);
> @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	size_t index;
>  	struct zram_meta *meta;
>  
> -	flush_work(&zram->free_work);
>  
>  	down_write(&zram->init_lock);
>  	if (!zram->init_done) {
> @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  		return;
>  	}
>  
> +	flush_work(&zram->free_work);
>  	meta = zram->meta;
>  	zram->init_done = 0;

this one looks ok to me.

	-ss

>  But more ideal way I am thinking now is 
> 
> 1) replace init_lock with RCU lock
> 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> 3) use atmoic lock in notify handler.
> 
> -- 
> Kind regards,
> Minchan Kim
> 
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-17 17:24                     ` Sergey Senozhatsky
@ 2013-09-23  4:24                       ` Minchan Kim
  -1 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2013-09-23  4:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Dan Carpenter, Jerome Marchand, driverdev-devel, linux-kernel

On Tue, Sep 17, 2013 at 08:24:45PM +0300, Sergey Senozhatsky wrote:
> 
> Hello,
> 
> On (09/16/13 09:02), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > Sorry for really slow response. I was really busy by internal works
> > and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> > I read your threads roughly so I may miss something. If so, sorry
> > for that. Anyway I will put my opinion.
> > 
> > On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> > 
> > Right but "init_lock" is what I really want to remove.
> > Yes. It's just read-side lock so most of time it doesn't hurt us but it
> > makes code very complicated and deadlock prone so I'd like to replace
> > it with RCU. Yeah, It's off topic but just let me put my opinion in
> > future direction.
> > 
> > Abought the bug, how about moving flush_work below down_write(init_lock)?
> > zram_make_request is already closed by init_lock and we have a rule about
> > lock ordering as following so I don't see any problem.
> > 
> >   init_lock
> >     zram->lock
> > 
> > > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > > zram_reset_device(). This also allows to safely check zram->init_done
> > > in handle_pending_slot_free().
> > > 
> > > Initial intention was to minimze number of handle_pending_slot_free()
> > > call from zram_bvec_rw(), which were slowing down READ requests due to
> > > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > > handle_pending_slot_free() from zram_bvec_rw().
> > > 
> > > Link: https://lkml.org/lkml/2013/9/9/172
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > 
> > > ---
> > > 
> > >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> > >  1 file changed, 5 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > index 91d94b5..7a2d4de 100644
> > > --- a/drivers/staging/zram/zram_drv.c
> > > +++ b/drivers/staging/zram/zram_drv.c
> > > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> > >  	while (zram->slot_free_rq) {
> > >  		free_rq = zram->slot_free_rq;
> > >  		zram->slot_free_rq = free_rq->next;
> > > -		zram_free_page(zram, free_rq->index);
> > > +		if (zram->init_done)
> > > +			zram_free_page(zram, free_rq->index);
> > >  		kfree(free_rq);
> > >  	}
> > >  	spin_unlock(&zram->slot_free_lock);
> > > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >  
> > >  	if (rw == READ) {
> > >  		down_read(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > 
> > Read side is okay but actually I have a nitpick.
> > If someone poll a block in zram-swap device, he would see a block
> > has zero value suddenly although there was no I/O.(I don't want to argue
> > it's sane user or not, anyway) it never happens on real block device and
> > it never happens on zram-block device. Only it can happen zram-swap device.
> > And such behavior was there since we introduced swap_slot_free_notify.
> > (off-topic: I'd like to remove it because it makes tight coupling between
> > zram and swap and obviously, it was layering violation function)
> > so now, I don't have strong objection. 
> > 
> > The idea is to remove swap_slot_free_notify is to use frontswap when
> > user want to use zram as swap so zram can be notified when the block
> > lose the owner but still we should solve the mutex problem in notify
> > handler.
> > 
> > 
> > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > >  		up_read(&zram->lock);
> > >  	} else {
> > >  		down_write(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > 
> > Why did you remove this in write-side?
> > We can't expect when the work will trigger. It means the work could remove
> > valid block under the us.
> > 
> 
> 
> not sure I understand how.
> zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
> down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
> any RW requests. RW requests are under read() lock and zram_slot_free() is under
> write() lock.

Let's consider example.
Swap subsystem asked to zram "A" block free from now by swap_slot_free_notify
but zram had been pended it without real freeing.
Swap reused "A" block for new data because "A" block was free but request pended
for a long time just handled and zram blindly free new data on the "A" block. :(
That's why we should handle pending free request right before zram-write.

Another try to optimize the lock overhead is to check the block is pending for free
right before zram_free_page in write path. If so, we should remove pending reuqest
from slot_free_rq list to prevent valid block later. But for that case, we need
more complex data structure to find the block fast and many checking code right
before zram_free_page so that it would make code rather complicated.

So, do you have any real workload for us to consider it's really troublesome?
Otherwise, I'd like to keep the code simple.

> 
> > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > >  		up_write(&zram->lock);
> > >  	}
> > > -
> > >  	return ret;
> > >  }
> > >  
> > > @@ -750,12 +748,11 @@ error:
> > >  
> > >  static void zram_slot_free(struct work_struct *work)
> > >  {
> > > -	struct zram *zram;
> > > +	struct zram *zram = container_of(work, struct zram, free_work);
> > >  
> > > -	zram = container_of(work, struct zram, free_work);
> > > -	down_write(&zram->lock);
> > > +	down_write(&zram->init_lock);
> > 
> > I don't like this.
> > Primary problem is we should handle it as atomic so that we should use
> > spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> > I should solve this problem as that way.
> > 
> > The simple solution popped from my mind is that
> > 
> > 
> > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > index 91d94b5..b23bf0e 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  
> >  	if (rw == READ) {
> >  		down_read(&zram->lock);
> > -		handle_pending_slot_free(zram);
> >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> >  		up_read(&zram->lock);
> >  	} else {
> >  		down_write(&zram->lock);
> > +		/*
> > +		 * We should free pending slot. Otherwise it would
> > +		 * free valid blocks under the us.
> > +		 */
> >  		handle_pending_slot_free(zram);
> >  		ret = zram_bvec_write(zram, bvec, index, offset);
> >  		up_write(&zram->lock);
> > @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  	size_t index;
> >  	struct zram_meta *meta;
> >  
> > -	flush_work(&zram->free_work);
> >  
> >  	down_write(&zram->init_lock);
> >  	if (!zram->init_done) {
> > @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  		return;
> >  	}
> >  
> > +	flush_work(&zram->free_work);
> >  	meta = zram->meta;
> >  	zram->init_done = 0;
> 
> this one looks ok to me.

If you don't mind, I'd like to go with this.
Thanks.


> 
> 	-ss
> 
> >  But more ideal way I am thinking now is 
> > 
> > 1) replace init_lock with RCU lock
> > 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> > 3) use atmoic lock in notify handler.
> > 
> > -- 
> > Kind regards,
> > Minchan Kim
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
@ 2013-09-23  4:24                       ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2013-09-23  4:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: driverdev-devel, Jerome Marchand, linux-kernel, Dan Carpenter

On Tue, Sep 17, 2013 at 08:24:45PM +0300, Sergey Senozhatsky wrote:
> 
> Hello,
> 
> On (09/16/13 09:02), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > Sorry for really slow response. I was really busy by internal works
> > and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> > I read your threads roughly so I may miss something. If so, sorry
> > for that. Anyway I will put my opinion.
> > 
> > On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> > 
> > Right but "init_lock" is what I really want to remove.
> > Yes. It's just read-side lock so most of time it doesn't hurt us but it
> > makes code very complicated and deadlock prone so I'd like to replace
> > it with RCU. Yeah, It's off topic but just let me put my opinion in
> > future direction.
> > 
> > Abought the bug, how about moving flush_work below down_write(init_lock)?
> > zram_make_request is already closed by init_lock and we have a rule about
> > lock ordering as following so I don't see any problem.
> > 
> >   init_lock
> >     zram->lock
> > 
> > > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > > zram_reset_device(). This also allows to safely check zram->init_done
> > > in handle_pending_slot_free().
> > > 
> > > Initial intention was to minimze number of handle_pending_slot_free()
> > > call from zram_bvec_rw(), which were slowing down READ requests due to
> > > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > > handle_pending_slot_free() from zram_bvec_rw().
> > > 
> > > Link: https://lkml.org/lkml/2013/9/9/172
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > 
> > > ---
> > > 
> > >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> > >  1 file changed, 5 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > index 91d94b5..7a2d4de 100644
> > > --- a/drivers/staging/zram/zram_drv.c
> > > +++ b/drivers/staging/zram/zram_drv.c
> > > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> > >  	while (zram->slot_free_rq) {
> > >  		free_rq = zram->slot_free_rq;
> > >  		zram->slot_free_rq = free_rq->next;
> > > -		zram_free_page(zram, free_rq->index);
> > > +		if (zram->init_done)
> > > +			zram_free_page(zram, free_rq->index);
> > >  		kfree(free_rq);
> > >  	}
> > >  	spin_unlock(&zram->slot_free_lock);
> > > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >  
> > >  	if (rw == READ) {
> > >  		down_read(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > 
> > Read side is okay but actually I have a nitpick.
> > If someone poll a block in zram-swap device, he would see a block
> > has zero value suddenly although there was no I/O.(I don't want to argue
> > it's sane user or not, anyway) it never happens on real block device and
> > it never happens on zram-block device. Only it can happen zram-swap device.
> > And such behavior was there since we introduced swap_slot_free_notify.
> > (off-topic: I'd like to remove it because it makes tight coupling between
> > zram and swap and obviously, it was layering violation function)
> > so now, I don't have strong objection. 
> > 
> > The idea is to remove swap_slot_free_notify is to use frontswap when
> > user want to use zram as swap so zram can be notified when the block
> > lose the owner but still we should solve the mutex problem in notify
> > handler.
> > 
> > 
> > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > >  		up_read(&zram->lock);
> > >  	} else {
> > >  		down_write(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > 
> > Why did you remove this in write-side?
> > We can't expect when the work will trigger. It means the work could remove
> > valid block under the us.
> > 
> 
> 
> not sure I understand how.
> zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
> down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
> any RW requests. RW requests are under read() lock and zram_slot_free() is under
> write() lock.

Let's consider example.
Swap subsystem asked to zram "A" block free from now by swap_slot_free_notify
but zram had been pended it without real freeing.
Swap reused "A" block for new data because "A" block was free but request pended
for a long time just handled and zram blindly free new data on the "A" block. :(
That's why we should handle pending free request right before zram-write.

Another try to optimize the lock overhead is to check the block is pending for free
right before zram_free_page in write path. If so, we should remove pending reuqest
from slot_free_rq list to prevent valid block later. But for that case, we need
more complex data structure to find the block fast and many checking code right
before zram_free_page so that it would make code rather complicated.

So, do you have any real workload for us to consider it's really troublesome?
Otherwise, I'd like to keep the code simple.

> 
> > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > >  		up_write(&zram->lock);
> > >  	}
> > > -
> > >  	return ret;
> > >  }
> > >  
> > > @@ -750,12 +748,11 @@ error:
> > >  
> > >  static void zram_slot_free(struct work_struct *work)
> > >  {
> > > -	struct zram *zram;
> > > +	struct zram *zram = container_of(work, struct zram, free_work);
> > >  
> > > -	zram = container_of(work, struct zram, free_work);
> > > -	down_write(&zram->lock);
> > > +	down_write(&zram->init_lock);
> > 
> > I don't like this.
> > Primary problem is we should handle it as atomic so that we should use
> > spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> > I should solve this problem as that way.
> > 
> > The simple solution popped from my mind is that
> > 
> > 
> > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > index 91d94b5..b23bf0e 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  
> >  	if (rw == READ) {
> >  		down_read(&zram->lock);
> > -		handle_pending_slot_free(zram);
> >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> >  		up_read(&zram->lock);
> >  	} else {
> >  		down_write(&zram->lock);
> > +		/*
> > +		 * We should free pending slot. Otherwise it would
> > +		 * free valid blocks under the us.
> > +		 */
> >  		handle_pending_slot_free(zram);
> >  		ret = zram_bvec_write(zram, bvec, index, offset);
> >  		up_write(&zram->lock);
> > @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  	size_t index;
> >  	struct zram_meta *meta;
> >  
> > -	flush_work(&zram->free_work);
> >  
> >  	down_write(&zram->init_lock);
> >  	if (!zram->init_done) {
> > @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >  		return;
> >  	}
> >  
> > +	flush_work(&zram->free_work);
> >  	meta = zram->meta;
> >  	zram->init_done = 0;
> 
> this one looks ok to me.

If you don't mind, I'd like to go with this.
Thanks.


> 
> 	-ss
> 
> >  But more ideal way I am thinking now is 
> > 
> > 1) replace init_lock with RCU lock
> > 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> > 3) use atmoic lock in notify handler.
> > 
> > -- 
> > Kind regards,
> > Minchan Kim
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Kind regards,
Minchan Kim
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
  2013-09-23  4:24                       ` Minchan Kim
@ 2013-09-23  8:42                         ` Sergey Senozhatsky
  -1 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-23  8:42 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Dan Carpenter, Jerome Marchand, driverdev-devel, linux-kernel

On (09/23/13 13:24), Minchan Kim wrote:
> > On (09/16/13 09:02), Minchan Kim wrote:
> > > Hello Sergey,
> > > 
> > > Sorry for really slow response. I was really busy by internal works
> > > and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> > > I read your threads roughly so I may miss something. If so, sorry
> > > for that. Anyway I will put my opinion.
> > > 
> > > On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > > > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > > > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> > > 
> > > Right but "init_lock" is what I really want to remove.
> > > Yes. It's just read-side lock so most of time it doesn't hurt us but it
> > > makes code very complicated and deadlock prone so I'd like to replace
> > > it with RCU. Yeah, It's off topic but just let me put my opinion in
> > > future direction.
> > > 
> > > Abought the bug, how about moving flush_work below down_write(init_lock)?
> > > zram_make_request is already closed by init_lock and we have a rule about
> > > lock ordering as following so I don't see any problem.
> > > 
> > >   init_lock
> > >     zram->lock
> > > 
> > > > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > > > zram_reset_device(). This also allows to safely check zram->init_done
> > > > in handle_pending_slot_free().
> > > > 
> > > > Initial intention was to minimze number of handle_pending_slot_free()
> > > > call from zram_bvec_rw(), which were slowing down READ requests due to
> > > > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > > > handle_pending_slot_free() from zram_bvec_rw().
> > > > 
> > > > Link: https://lkml.org/lkml/2013/9/9/172
> > > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > > 
> > > > ---
> > > > 
> > > >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> > > >  1 file changed, 5 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > > index 91d94b5..7a2d4de 100644
> > > > --- a/drivers/staging/zram/zram_drv.c
> > > > +++ b/drivers/staging/zram/zram_drv.c
> > > > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> > > >  	while (zram->slot_free_rq) {
> > > >  		free_rq = zram->slot_free_rq;
> > > >  		zram->slot_free_rq = free_rq->next;
> > > > -		zram_free_page(zram, free_rq->index);
> > > > +		if (zram->init_done)
> > > > +			zram_free_page(zram, free_rq->index);
> > > >  		kfree(free_rq);
> > > >  	}
> > > >  	spin_unlock(&zram->slot_free_lock);
> > > > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > > >  
> > > >  	if (rw == READ) {
> > > >  		down_read(&zram->lock);
> > > > -		handle_pending_slot_free(zram);
> > > 
> > > Read side is okay but actually I have a nitpick.
> > > If someone poll a block in zram-swap device, he would see a block
> > > has zero value suddenly although there was no I/O.(I don't want to argue
> > > it's sane user or not, anyway) it never happens on real block device and
> > > it never happens on zram-block device. Only it can happen zram-swap device.
> > > And such behavior was there since we introduced swap_slot_free_notify.
> > > (off-topic: I'd like to remove it because it makes tight coupling between
> > > zram and swap and obviously, it was layering violation function)
> > > so now, I don't have strong objection. 
> > > 
> > > The idea is to remove swap_slot_free_notify is to use frontswap when
> > > user want to use zram as swap so zram can be notified when the block
> > > lose the owner but still we should solve the mutex problem in notify
> > > handler.
> > > 
> > > 
> > > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > > >  		up_read(&zram->lock);
> > > >  	} else {
> > > >  		down_write(&zram->lock);
> > > > -		handle_pending_slot_free(zram);
> > > 
> > > Why did you remove this in write-side?
> > > We can't expect when the work will trigger. It means the work could remove
> > > valid block under the us.
> > > 
> > 
> > 
> > not sure I understand how.
> > zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
> > down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
> > any RW requests. RW requests are under read() lock and zram_slot_free() is under
> > write() lock.
> 
> Let's consider example.
> Swap subsystem asked to zram "A" block free from now by swap_slot_free_notify
> but zram had been pended it without real freeing.
> Swap reused "A" block for new data because "A" block was free but request pended
> for a long time just handled and zram blindly free new data on the "A" block. :(
> That's why we should handle pending free request right before zram-write.
> 
> Another try to optimize the lock overhead is to check the block is pending for free
> right before zram_free_page in write path. If so, we should remove pending reuqest
> from slot_free_rq list to prevent valid block later. But for that case, we need
> more complex data structure to find the block fast and many checking code right
> before zram_free_page so that it would make code rather complicated.
> 
> So, do you have any real workload for us to consider it's really troublesome?
> Otherwise, I'd like to keep the code simple.
> 
> > 
> > > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > > >  		up_write(&zram->lock);
> > > >  	}
> > > > -
> > > >  	return ret;
> > > >  }
> > > >  
> > > > @@ -750,12 +748,11 @@ error:
> > > >  
> > > >  static void zram_slot_free(struct work_struct *work)
> > > >  {
> > > > -	struct zram *zram;
> > > > +	struct zram *zram = container_of(work, struct zram, free_work);
> > > >  
> > > > -	zram = container_of(work, struct zram, free_work);
> > > > -	down_write(&zram->lock);
> > > > +	down_write(&zram->init_lock);
> > > 
> > > I don't like this.
> > > Primary problem is we should handle it as atomic so that we should use
> > > spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> > > I should solve this problem as that way.
> > > 
> > > The simple solution popped from my mind is that
> > > 
> > > 
> > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > index 91d94b5..b23bf0e 100644
> > > --- a/drivers/staging/zram/zram_drv.c
> > > +++ b/drivers/staging/zram/zram_drv.c
> > > @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >  
> > >  	if (rw == READ) {
> > >  		down_read(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > >  		up_read(&zram->lock);
> > >  	} else {
> > >  		down_write(&zram->lock);
> > > +		/*
> > > +		 * We should free pending slot. Otherwise it would
> > > +		 * free valid blocks under the us.
> > > +		 */
> > >  		handle_pending_slot_free(zram);
> > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > >  		up_write(&zram->lock);
> > > @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  	size_t index;
> > >  	struct zram_meta *meta;
> > >  
> > > -	flush_work(&zram->free_work);
> > >  
> > >  	down_write(&zram->init_lock);
> > >  	if (!zram->init_done) {
> > > @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  		return;
> > >  	}
> > >  
> > > +	flush_work(&zram->free_work);
> > >  	meta = zram->meta;
> > >  	zram->init_done = 0;
> > 
> > this one looks ok to me.
> 
> If you don't mind, I'd like to go with this.
> Thanks.

sure, no objections.

	-ss

> 
> > 
> > 	-ss
> > 
> > >  But more ideal way I am thinking now is 
> > > 
> > > 1) replace init_lock with RCU lock
> > > 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> > > 3) use atmoic lock in notify handler.
> > > 
> > > -- 
> > > Kind regards,
> > > Minchan Kim
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race
@ 2013-09-23  8:42                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2013-09-23  8:42 UTC (permalink / raw)
  To: Minchan Kim; +Cc: driverdev-devel, Jerome Marchand, linux-kernel, Dan Carpenter

On (09/23/13 13:24), Minchan Kim wrote:
> > On (09/16/13 09:02), Minchan Kim wrote:
> > > Hello Sergey,
> > > 
> > > Sorry for really slow response. I was really busy by internal works
> > > and Thanks for pointing the BUG, Dan, Jerome and Sergey.
> > > I read your threads roughly so I may miss something. If so, sorry
> > > for that. Anyway I will put my opinion.
> > > 
> > > On Wed, Sep 11, 2013 at 02:12:50AM +0300, Sergey Senozhatsky wrote:
> > > > Dan Carpenter noted that handle_pending_slot_free() is racy with
> > > > zram_reset_device(). Take write init_lock in zram_slot_free(), thus
> > > 
> > > Right but "init_lock" is what I really want to remove.
> > > Yes. It's just read-side lock so most of time it doesn't hurt us but it
> > > makes code very complicated and deadlock prone so I'd like to replace
> > > it with RCU. Yeah, It's off topic but just let me put my opinion in
> > > future direction.
> > > 
> > > Abought the bug, how about moving flush_work below down_write(init_lock)?
> > > zram_make_request is already closed by init_lock and we have a rule about
> > > lock ordering as following so I don't see any problem.
> > > 
> > >   init_lock
> > >     zram->lock
> > > 
> > > > preventing any concurrent zram_slot_free(), zram_bvec_rw() or
> > > > zram_reset_device(). This also allows to safely check zram->init_done
> > > > in handle_pending_slot_free().
> > > > 
> > > > Initial intention was to minimze number of handle_pending_slot_free()
> > > > call from zram_bvec_rw(), which were slowing down READ requests due to
> > > > slot_free_lock spin lock. Jerome Marchand suggested to remove
> > > > handle_pending_slot_free() from zram_bvec_rw().
> > > > 
> > > > Link: https://lkml.org/lkml/2013/9/9/172
> > > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > > 
> > > > ---
> > > > 
> > > >  drivers/staging/zram/zram_drv.c | 13 +++++--------
> > > >  1 file changed, 5 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > > index 91d94b5..7a2d4de 100644
> > > > --- a/drivers/staging/zram/zram_drv.c
> > > > +++ b/drivers/staging/zram/zram_drv.c
> > > > @@ -521,7 +521,8 @@ static void handle_pending_slot_free(struct zram *zram)
> > > >  	while (zram->slot_free_rq) {
> > > >  		free_rq = zram->slot_free_rq;
> > > >  		zram->slot_free_rq = free_rq->next;
> > > > -		zram_free_page(zram, free_rq->index);
> > > > +		if (zram->init_done)
> > > > +			zram_free_page(zram, free_rq->index);
> > > >  		kfree(free_rq);
> > > >  	}
> > > >  	spin_unlock(&zram->slot_free_lock);
> > > > @@ -534,16 +535,13 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > > >  
> > > >  	if (rw == READ) {
> > > >  		down_read(&zram->lock);
> > > > -		handle_pending_slot_free(zram);
> > > 
> > > Read side is okay but actually I have a nitpick.
> > > If someone poll a block in zram-swap device, he would see a block
> > > has zero value suddenly although there was no I/O.(I don't want to argue
> > > it's sane user or not, anyway) it never happens on real block device and
> > > it never happens on zram-block device. Only it can happen zram-swap device.
> > > And such behavior was there since we introduced swap_slot_free_notify.
> > > (off-topic: I'd like to remove it because it makes tight coupling between
> > > zram and swap and obviously, it was layering violation function)
> > > so now, I don't have strong objection. 
> > > 
> > > The idea is to remove swap_slot_free_notify is to use frontswap when
> > > user want to use zram as swap so zram can be notified when the block
> > > lose the owner but still we should solve the mutex problem in notify
> > > handler.
> > > 
> > > 
> > > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > > >  		up_read(&zram->lock);
> > > >  	} else {
> > > >  		down_write(&zram->lock);
> > > > -		handle_pending_slot_free(zram);
> > > 
> > > Why did you remove this in write-side?
> > > We can't expect when the work will trigger. It means the work could remove
> > > valid block under the us.
> > > 
> > 
> > 
> > not sure I understand how.
> > zram_slot_free() takes down_write(&zram->init_lock) and zram_make_request() takes
> > down_read(&zram->init_lock), thus zram_slot_free() can not concurrently work with
> > any RW requests. RW requests are under read() lock and zram_slot_free() is under
> > write() lock.
> 
> Let's consider example.
> Swap subsystem asked to zram "A" block free from now by swap_slot_free_notify
> but zram had been pended it without real freeing.
> Swap reused "A" block for new data because "A" block was free but request pended
> for a long time just handled and zram blindly free new data on the "A" block. :(
> That's why we should handle pending free request right before zram-write.
> 
> Another try to optimize the lock overhead is to check the block is pending for free
> right before zram_free_page in write path. If so, we should remove pending reuqest
> from slot_free_rq list to prevent valid block later. But for that case, we need
> more complex data structure to find the block fast and many checking code right
> before zram_free_page so that it would make code rather complicated.
> 
> So, do you have any real workload for us to consider it's really troublesome?
> Otherwise, I'd like to keep the code simple.
> 
> > 
> > > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > > >  		up_write(&zram->lock);
> > > >  	}
> > > > -
> > > >  	return ret;
> > > >  }
> > > >  
> > > > @@ -750,12 +748,11 @@ error:
> > > >  
> > > >  static void zram_slot_free(struct work_struct *work)
> > > >  {
> > > > -	struct zram *zram;
> > > > +	struct zram *zram = container_of(work, struct zram, free_work);
> > > >  
> > > > -	zram = container_of(work, struct zram, free_work);
> > > > -	down_write(&zram->lock);
> > > > +	down_write(&zram->init_lock);
> > > 
> > > I don't like this.
> > > Primary problem is we should handle it as atomic so that we should use
> > > spinlock instead of mutex. Yeah, /me kicks his ass. From the beginning,
> > > I should solve this problem as that way.
> > > 
> > > The simple solution popped from my mind is that
> > > 
> > > 
> > > diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> > > index 91d94b5..b23bf0e 100644
> > > --- a/drivers/staging/zram/zram_drv.c
> > > +++ b/drivers/staging/zram/zram_drv.c
> > > @@ -534,11 +534,14 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >  
> > >  	if (rw == READ) {
> > >  		down_read(&zram->lock);
> > > -		handle_pending_slot_free(zram);
> > >  		ret = zram_bvec_read(zram, bvec, index, offset, bio);
> > >  		up_read(&zram->lock);
> > >  	} else {
> > >  		down_write(&zram->lock);
> > > +		/*
> > > +		 * We should free pending slot. Otherwise it would
> > > +		 * free valid blocks under the us.
> > > +		 */
> > >  		handle_pending_slot_free(zram);
> > >  		ret = zram_bvec_write(zram, bvec, index, offset);
> > >  		up_write(&zram->lock);
> > > @@ -552,7 +555,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  	size_t index;
> > >  	struct zram_meta *meta;
> > >  
> > > -	flush_work(&zram->free_work);
> > >  
> > >  	down_write(&zram->init_lock);
> > >  	if (!zram->init_done) {
> > > @@ -560,6 +562,7 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> > >  		return;
> > >  	}
> > >  
> > > +	flush_work(&zram->free_work);
> > >  	meta = zram->meta;
> > >  	zram->init_done = 0;
> > 
> > this one looks ok to me.
> 
> If you don't mind, I'd like to go with this.
> Thanks.

sure, no objections.

	-ss

> 
> > 
> > 	-ss
> > 
> > >  But more ideal way I am thinking now is 
> > > 
> > > 1) replace init_lock with RCU lock
> > > 2) introduce new meta atmoic lock instead of zram->mutex, which is very coarse-grained.
> > > 3) use atmoic lock in notify handler.
> > > 
> > > -- 
> > > Kind regards,
> > > Minchan Kim
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Kind regards,
> Minchan Kim
> 
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-09-23  9:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-06 15:12 [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
2013-09-09 12:33 ` Dan Carpenter
2013-09-09 12:49   ` Sergey Senozhatsky
2013-09-09 13:21     ` Dan Carpenter
2013-09-09 13:46       ` Jerome Marchand
2013-09-09 16:10         ` Jerome Marchand
2013-09-10 14:34           ` Sergey Senozhatsky
2013-09-10 14:58             ` Dan Carpenter
2013-09-10 15:15               ` Greg Kroah-Hartman
2013-09-10 23:12               ` [PATCH 1/2] staging: zram: fix handle_pending_slot_free() and zram_reset_device() race Sergey Senozhatsky
2013-09-10 23:12                 ` Sergey Senozhatsky
2013-09-12 22:12                 ` Greg KH
2013-09-13  9:17                   ` Sergey Senozhatsky
2013-09-16  0:02                 ` Minchan Kim
2013-09-16  0:02                   ` Minchan Kim
2013-09-17 17:24                   ` Sergey Senozhatsky
2013-09-17 17:24                     ` Sergey Senozhatsky
2013-09-23  4:24                     ` Minchan Kim
2013-09-23  4:24                       ` Minchan Kim
2013-09-23  8:42                       ` Sergey Senozhatsky
2013-09-23  8:42                         ` Sergey Senozhatsky
2013-09-10 23:19               ` [PATCH 2/2] staging: zram: remove init_done from zram struct (v3) Sergey Senozhatsky
2013-09-10 23:19                 ` Sergey Senozhatsky
2013-09-10 23:27             ` [PATCH 1/2] staging: zram: minimize `slot_free_lock' usage (v2) Sergey Senozhatsky
2013-09-09 14:42       ` Sergey Senozhatsky
2013-09-09 14:52         ` Dan Carpenter
2013-09-09 15:09           ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.