Discussion:
[PATCH] xfs: errors on sync superblock writes leave it locked
(too old to reply)
Dave Chinner
2011-01-04 04:50:09 UTC
Permalink
From: Dave Chinner <***@redhat.com>

If we get an IO error on a synchronous superblock write, we attach a
error release function to it so that when the last reference goes
away the release function is called and the buffer is invalidated
and unlocked. The buffer is left locked until the release function
is called so that other concurrent users of the buffer will be
locked out until the buffer error is fully processed.

Unfortunately, for the superblock buffer the filesyetm itself holds
a reference to the buffer which prevents the reference count from
dropping to zero and the release function being called. As a result,
once an IO error occurs on a sync write, the buffer will never be
unlocked and all future attempts to lock the buffer will hang.

To make matters worse, this problems is not unique to such buffers;
if there is a concurrent _xfs_buf_find() running, the lookup will
grab a reference to the buffer and then wait on the buffer lock,
preventing the reference count from ever falling to zero and hence
unlocking the buffer.

As such, the whole b_relse function implementation is broken because
it cannot rely on the buffer reference count falling to zero to
unlock the errored buffer. The synchronous write error path is the
only path that uses this callback - it is used to ensure that the
synchronous waiter gets the buffer error before the error state is
cleared from the buffer by the release function.

Given that the only sychronous buffer writes now go through
xfs_bwrite() and the error path in question can only occur for a
write of a dirty, logged buffer, we can call the b_relse function
when an error is detected in xfs_bwrite() after calling
xfs_buf_iowait(). The subsequent xfs_buf_relse() call in
xfs_bwrite() will then unlock the buffer and everything should
continue as per normal.

Signed-off-by: Dave Chinner <***@redhat.com>
---
fs/xfs/linux-2.6/xfs_buf.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 92f1f2a..1775269 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -908,11 +908,8 @@ xfs_buf_rele(

ASSERT(atomic_read(&bp->b_hold) > 0);
if (atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock)) {
- if (bp->b_relse) {
- atomic_inc(&bp->b_hold);
- spin_unlock(&pag->pag_buf_lock);
- bp->b_relse(bp);
- } else if (!(bp->b_flags & XBF_STALE) &&
+ ASSERT(!bp->b_relse);
+ if (!(bp->b_flags & XBF_STALE) &&
atomic_read(&bp->b_lru_ref)) {
xfs_buf_lru_add(bp);
spin_unlock(&pag->pag_buf_lock);
@@ -1112,8 +1109,16 @@ xfs_bwrite(
xfs_bdstrat_cb(bp);

error = xfs_buf_iowait(bp);
- if (error)
+ if (error) {
+ /*
+ * If the error caused a release function to be set, call it
+ * now to clear the error from the buffer as we have already
+ * harvested it.
+ */
xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
+ if (bp->b_relse)
+ bp->b_relse(bp);
+ }
xfs_buf_relse(bp);
return error;
}
--
1.7.2.3
Dave Chinner
2011-01-07 06:11:25 UTC
Permalink
I don't think the patch is quite correct. In the old code xfs_buf_rele
incremented the buffer reference count before calling ->b_relse,
expecting it do decrement it again.
I think the best fix is to kill ->b_relse entirely. We can simply
do the buffer callback processing and b_flags updates in
xfs_buf_iodone_callbacks. The important thing is to not clear the
buffer error there, so that it actually get propagated to the caller.
As the buffer remains locked until xfs_bwrite calls xfs_buf_relse it
can get the error reliably that way.
Looks sane. can i get a signed-off-by for it from you and I'll
include it in my remaining -for-2.6.38 series?

Cheers,

Dave.
--
Dave Chinner
***@fromorbit.com
Loading...