linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE
@ 2019-08-09  8:39 Pavel Tikhomirov
  2019-11-11 14:52 ` Pavel Tikhomirov
  0 siblings, 1 reply; 2+ messages in thread
From: Pavel Tikhomirov @ 2019-08-09  8:39 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Christoph Hellwig, Jens Axboe, Andrew Morton, Dennis Zhou,
	Josef Bacik, Tejun Heo, Huang Ying, Oleg Nesterov, Omar Sandoval,
	Pavel Tikhomirov

Hi, all.

Then porting patches from mainstream I've found some strange code:

 > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
 > Author: Christoph Hellwig <hch@lst.de>
 > Date:   Tue Nov 1 07:40:09 2016 -0600
 >
 >     block: replace REQ_NOIDLE with REQ_IDLE
 >
 >     Noidle should be the default for writes as seen by all the compounds
 >     definitions in fs.h using it.  In fact only direct I/O really should
 >     be using NODILE, so turn the whole flag around to get the defaults
 >     right, which will make our life much easier especially onces the
 >     WRITE_* defines go away.
 >
 >     This assumes all the existing "raw" users of REQ_SYNC for writes
 >     want noidle behavior, which seems to be spot on from a quick audit.
 >
 >     Signed-off-by: Christoph Hellwig <hch@lst.de>
 >     Signed-off-by: Jens Axboe <axboe@fb.com>
 >
 > diff --git a/include/linux/fs.h b/include/linux/fs.h
 > index ccedccb28ec8..46a74209917f 100644
 > --- a/include/linux/fs.h
 > +++ b/include/linux/fs.h
 > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
 >  #define WRITE                  REQ_OP_WRITE
 >
 >  #define READ_SYNC              0
 > -#define WRITE_SYNC             (REQ_SYNC | REQ_NOIDLE)
 > -#define WRITE_ODIRECT          REQ_SYNC
 > -#define WRITE_FLUSH            (REQ_NOIDLE | REQ_PREFLUSH)
 > -#define WRITE_FUA              (REQ_NOIDLE | REQ_FUA)
 > -#define WRITE_FLUSH_FUA                (REQ_NOIDLE | REQ_PREFLUSH | 
REQ_FUA)
 > +#define WRITE_SYNC             REQ_SYNC
 > +#define WRITE_ODIRECT          (REQ_SYNC | REQ_IDLE)
 > +#define WRITE_FLUSH            REQ_PREFLUSH
 > +#define WRITE_FUA              REQ_FUA
 > +#define WRITE_FLUSH_FUA                (REQ_PREFLUSH | REQ_FUA)
 >
 >  /*
 >   * Attribute flags.  These should be or-ed together to figure out what

The above commit changes the meaning of the REQ_SYNC flag, before the 
patch it was equal to WRITE_ODIRECT and after the patch it is equal to 
WRITE_SYNC. And thus I think it became treated differently (I see only 
one place left in wbt_should_throttle.).

But in __swap_writepage() both before and after the mentioned patch we 
still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:

 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);
 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);

It looks like we've changed the way how we handle swap page writes from 
"odirect" way to "regular" sync write way, these can be wrong. This may 
also affect deprecated cfq io-scheduler on older kernels.

Thanks in advance for any advice on what to do with these, may be I miss 
something.

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE
  2019-08-09  8:39 mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE Pavel Tikhomirov
@ 2019-11-11 14:52 ` Pavel Tikhomirov
  0 siblings, 0 replies; 2+ messages in thread
From: Pavel Tikhomirov @ 2019-11-11 14:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, Jens Axboe, Andrew Morton, Dennis Zhou,
	Josef Bacik, Tejun Heo, Huang Ying, Oleg Nesterov, Omar Sandoval

Christoph, a2b809672ee6 is your patch, maybe you have some ideas about 
these problem?

ping

On 8/9/19 11:39 AM, Pavel Tikhomirov wrote:
> Hi, all.
> 
> Then porting patches from mainstream I've found some strange code:
> 
>   > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
>   > Author: Christoph Hellwig <hch@lst.de>
>   > Date:   Tue Nov 1 07:40:09 2016 -0600
>   >
>   >     block: replace REQ_NOIDLE with REQ_IDLE
>   >
>   >     Noidle should be the default for writes as seen by all the compounds
>   >     definitions in fs.h using it.  In fact only direct I/O really should
>   >     be using NODILE, so turn the whole flag around to get the defaults
>   >     right, which will make our life much easier especially onces the
>   >     WRITE_* defines go away.
>   >
>   >     This assumes all the existing "raw" users of REQ_SYNC for writes
>   >     want noidle behavior, which seems to be spot on from a quick audit.
>   >
>   >     Signed-off-by: Christoph Hellwig <hch@lst.de>
>   >     Signed-off-by: Jens Axboe <axboe@fb.com>
>   >
>   > diff --git a/include/linux/fs.h b/include/linux/fs.h
>   > index ccedccb28ec8..46a74209917f 100644
>   > --- a/include/linux/fs.h
>   > +++ b/include/linux/fs.h
>   > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb,
> loff_t offset,
>   >  #define WRITE                  REQ_OP_WRITE
>   >
>   >  #define READ_SYNC              0
>   > -#define WRITE_SYNC             (REQ_SYNC | REQ_NOIDLE)
>   > -#define WRITE_ODIRECT          REQ_SYNC
>   > -#define WRITE_FLUSH            (REQ_NOIDLE | REQ_PREFLUSH)
>   > -#define WRITE_FUA              (REQ_NOIDLE | REQ_FUA)
>   > -#define WRITE_FLUSH_FUA                (REQ_NOIDLE | REQ_PREFLUSH |
> REQ_FUA)
>   > +#define WRITE_SYNC             REQ_SYNC
>   > +#define WRITE_ODIRECT          (REQ_SYNC | REQ_IDLE)
>   > +#define WRITE_FLUSH            REQ_PREFLUSH
>   > +#define WRITE_FUA              REQ_FUA
>   > +#define WRITE_FLUSH_FUA                (REQ_PREFLUSH | REQ_FUA)
>   >
>   >  /*
>   >   * Attribute flags.  These should be or-ed together to figure out what
> 
> The above commit changes the meaning of the REQ_SYNC flag, before the
> patch it was equal to WRITE_ODIRECT and after the patch it is equal to
> WRITE_SYNC. And thus I think it became treated differently (I see only
> one place left in wbt_should_throttle.).
> 
> But in __swap_writepage() both before and after the mentioned patch we
> still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:
> 
>   > [snorch@snorch linux]$ git blame
> a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319)
>           unlock_page(page);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320)
>           ret = -ENOMEM;
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321)
>           goto out;
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323)
> if (wbc->sync_mode == WB_SYNC_ALL)
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324)
>           bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325)
> else
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326)
>           bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>   > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327)
> count_vm_event(PSWPOUT);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328)
> set_page_writeback(page);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329)
> unlock_page(page);
>   > [snorch@snorch linux]$ git blame
> a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319)
>           unlock_page(page);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320)
>           ret = -ENOMEM;
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321)
>           goto out;
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323)
> if (wbc->sync_mode == WB_SYNC_ALL)
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324)
>           bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325)
> else
>   > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326)
>           bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
>   > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327)
> count_vm_event(PSWPOUT);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328)
> set_page_writeback(page);
>   > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329)
> unlock_page(page);
> 
> It looks like we've changed the way how we handle swap page writes from
> "odirect" way to "regular" sync write way, these can be wrong. This may
> also affect deprecated cfq io-scheduler on older kernels.
> 
> Thanks in advance for any advice on what to do with these, may be I miss
> something.
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-11-11 14:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09  8:39 mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE Pavel Tikhomirov
2019-11-11 14:52 ` Pavel Tikhomirov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).