Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement for creating pretty extent at flush #211

Open
wants to merge 6 commits into
base: pretty-extent
Choose a base branch
from
Open

Enhancement for creating pretty extent at flush #211

wants to merge 6 commits into from

Conversation

piste-jp-ibm
Copy link
Member

Summary of changes

Change the implementation of flush() request to enqueue the remaining blocks to the writer queue. And make them processed sequentially.

Description

Previously, our flush operation might flip the final block when we get a small time window below.

  1. LTFS receives a write and the unified scheduler completes one block and send it to the Q
  2. LTFS receives another write
  3. LTFS receives a flush
  4. LTFS processes the block created in step2 within flush operation (because previous flush stops the writer thread and write blocks directly)
  5. The writer thread wake up because of step1 and process the block queued in step1

In this change, all blocks are queued in flush operation and shall be processed by writer thread.

  • Do not wait the writer thread processes the queued block when flush is called against individual files
  • Wait the writer thread processes the queued block when flush is called against all files (like periodical sync request)

I believe this strategy improves the LTFS performance in normal condition (little bit make LTFS danger but enough safe, I believe). And LTFS makes a pretty extent in any time.

Type of change

  • Enhancement with long term test

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have confirmed my fix is effective or that my feature works

Copy link
Member

@lucasvr lucasvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Abe-san! Thanks for your hard work on this one, as usual! Here are some comments about your changes.

I see that you've introduced writer_cond and writer_lock, and then use broadcast() to wake up those blocked on cond_wait(). There is an issue here, which is that it is possible for threads to receive spurious awake calls (that is, the condition may hold false even though it should not). I have been bitten by this problem before, and debugging this kind of problem can be a headache. The right way to test for wake up condition is to introduce a third variable (such as a boolean like bool writer_can_wakeup = true/false), and then rewrite flush_all so it looks like this:

    pthread_mutex_lock(writer_lock);
    while (! writer_can_wakeup)
        pthread_cond_wait(writer_cond, writer_lock);
    pthread_mutex_unlock(writer_lock);

You'll have to initialize writer_can_wakeup=false and make sure to set it to true prior to your call to broadcast().

Also, I noticed that the documentation of _unified_get_dentry_priv needs to be changed to reflect the new locks that must be held when calling that function.

Please pay special attention to my question regarding the removal of lock(d->iosched_lock) in unified_read. It looks to me like that like should not have been removed.

Best regards.

struct fcfs_data *priv = (struct fcfs_data *) iosched_handle;

CHECK_ARG_NULL(path, -LTFS_NULL_ARG);
CHECK_ARG_NULL(dentry, -LTFS_NULL_ARG);
CHECK_ARG_NULL(iosched_handle, -LTFS_NULL_ARG);

return ltfs_fsraw_open(path, open_write, dentry, priv->vol);
ret = ltfs_fsraw_open(path, open_write, dentry, priv->vol);
if (!ret)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the I/O schedulers we're used to write if (ret == 0) to check if a function call succeeded. I think it's good idea to use the same coding pattern, as ! ret is used to check if an operation failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, if (!ret) is used in libltfs. So I think it is good to use. I'll use if (ret == 0).

I think Brian prefers to use if (!ret), good call, and if (ret < 0), bad call`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't remember we used that pattern in libltfs. No worries then -- this is a minor issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in my branch.

@@ -434,6 +477,9 @@ int unified_open(const char *path, bool open_write, struct dentry **dentry, void

ltfs_profiler_add_entry(priv->profiler, &priv->proflock, IOSCHED_REQ_ENTER(REQ_IOS_OPEN));
ret = ltfs_fsraw_open(path, open_write, dentry, ((struct unified_data *)iosched_handle)->vol);
if (!ret) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my previous suggestion: at least in the I/O schedulers we're used to write if (ret == 0) to check if a function call succeeded. I think it's good idea to use the same coding pattern, as ! ret is used to check if an operation failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in my branch.

@@ -450,22 +496,26 @@ int unified_close(struct dentry *d, bool flush, void *iosched_handle)
{
int write_error, ret = 0;
struct unified_data *priv = iosched_handle;
struct dentry_priv *dpr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good idea to initialize this one to NULL, just in case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in my branch.


CHECK_ARG_NULL(d, -LTFS_NULL_ARG);
CHECK_ARG_NULL(iosched_handle, -LTFS_NULL_ARG);
ltfs_profiler_add_entry(priv->profiler, &priv->proflock, IOSCHED_REQ_ENTER(REQ_IOS_CLOSE));

acquireread_mrsw(&priv->lock);
ltfs_mutex_lock(&d->iosched_lock);
ret = _unified_get_dentry_priv(d, &dpr, priv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the value of 'ret' is not checked (and it's potentially overwritten by the branch condition below)

@@ -514,8 +564,7 @@ ssize_t unified_read(struct dentry *d, char *buf, size_t size, off_t offset, voi
goto out;
releaseread_mrsw(&priv->vol->lock);

ltfs_mutex_lock(&d->iosched_lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to remove this? iosched_lock is needed to protect access to dpr->requests. If you are sure about removing it, then please note that you still have calls to ltfs_mutex_unlock() in this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is my mistake. Fixed in my branch.

Comment on lines +1548 to +1587
void _unified_put_dentry_priv(struct dentry_priv *dentry_priv, struct unified_data *priv)
{
struct dentry_priv *dpr = dentry_priv;
struct dentry *d = dpr->dentry;

acquirewrite_mrsw(&d->meta_lock);
ltfs_mutex_lock(&dpr->ref_lock);
if (dpr->numhandles > 0) {
dpr->numhandles--;
}

if (!dpr->numhandles) {
d->iosched_priv = NULL;
ltfs_mutex_unlock(&dpr->ref_lock);
releasewrite_mrsw(&d->meta_lock);

if (! TAILQ_EMPTY(&dpr->requests))
ltfsmsg(LTFS_WARN, 13022W);

/* Sent alt_extentlist to libltfs */
if (dpr->write_ip && ! TAILQ_EMPTY(&dpr->alt_extentlist))
_unified_clear_alt_extentlist(true, dpr, priv);

ltfs_mutex_destroy(&dpr->write_error_lock);
ltfs_mutex_destroy(&dpr->ref_lock);
ltfs_mutex_destroy(&dpr->io_lock);
free(dpr);

ltfs_fsraw_put_dentry(d, priv->vol);

ltfsmsg(LTFS_DEBUG, 13028D, d->name.name);
} else {
ltfsmsg(LTFS_DEBUG3, 13029D, "Dec", d->name.name, dpr->numhandles);
ltfs_mutex_unlock(&dpr->ref_lock);
releasewrite_mrsw(&d->meta_lock);
}

return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disposal logic looks good


if (dpr) {
ltfsmsg(LTFS_DEBUG3, 13032D, req);
_unified_put_dentry_priv(req->dpr, priv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the 'dpr' alias here too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in my branch.

@@ -1772,6 +1955,10 @@ ssize_t _unified_insert_new_request(const char *buf, off_t offset, size_t count,
releaseread_mrsw(&priv->lock);
return -LTFS_NO_MEMORY;
}

_unified_get_dentry_priv(d, &dpr, priv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter if we get a NULL dpr here or not? We're not checking that; not sure if we should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed actually. I will remove this initializer for now. Fixed in my branch.

But we may need to add some initializer in the future because some compilers reports a warning.

Comment on lines +2116 to +2177
{
ssize_t ret = 0;
bool requeued = false;
struct dentry_priv *dpr;
struct write_request *req, *aux;

CHECK_ARG_NULL(d, -LTFS_NULL_ARG);
CHECK_ARG_NULL(priv, -LTFS_NULL_ARG);

_unified_get_dentry_priv(d, &dpr, priv);
if (! dpr) {
return 0;
}

/* Check for previous write errors */
ret = _unified_get_write_error(dpr);
if (ret < 0) {
_unified_put_dentry_priv(dpr, priv);
return ret;
}

if (TAILQ_EMPTY(&dpr->requests)) {
_unified_put_dentry_priv(dpr, priv);
return 0;
}

/* Enqueue requests to DP queue */
ltfs_thread_mutex_lock(&priv->queue_lock);
TAILQ_FOREACH_SAFE(req, &dpr->requests, list, aux) {
if (req->state == REQUEST_PARTIAL) {
if (dpr->in_working_set == 1) {
TAILQ_REMOVE(&priv->working_set, dpr, working_set);
--priv->ws_count;
}
if (dpr->in_working_set) {
--priv->ws_request_count;
--dpr->in_working_set;
}

req->state = REQUEST_DP;

if (! dpr->in_dp_queue) {
TAILQ_INSERT_TAIL(&priv->dp_queue, dpr, dp_queue);
++priv->dp_count;
}
if (! dpr->write_ip)
++priv->dp_request_count;
++dpr->in_dp_queue;

requeued = true;
}
}
ltfs_thread_mutex_unlock(&priv->queue_lock);

/* Tell background thread a write request is ready */
if (requeued)
ltfs_thread_cond_signal(&priv->queue_cond);

_unified_put_dentry_priv(dpr, priv);

return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall logic looks good

Comment on lines 2291 to 2295
} else {
if (empty)
empty = true;
else
empty = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block below looks funny. Are you sure this is what you wanted to have here? You can probably delete it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to set bool empty only when both queues are empty.

empty flag dp_queue working_set
true EMPTY EMPTY
false NOT EMPTY EMPTY
false EMPTY NOT EMPTY
false NOT EMPTY NOT EMPTY

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this ?

	ltfs_thread_mutex_lock(&priv->writer_lock);
	acquirewrite_mrsw(&priv->lock);

	/* First of all, test both are empty */
	if (TAILQ_EMPTY(&priv->dp_queue) && TAILQ_EMPTY(&priv->working_set)) {
		empty = true;
	} else {
		if (! TAILQ_EMPTY(&priv->dp_queue)) {
			TAILQ_FOREACH_SAFE(dpr, &priv->dp_queue, dp_queue, aux) {
				ltfsmsg(LTFS_DEBUG, 13033D, "DP", dpr->dentry->platform_safe_name);
				ltfs_mutex_lock(&dpr->dentry->iosched_lock);
				ret = _unified_flush_unlocked(dpr->dentry, priv);
				ltfs_mutex_unlock(&dpr->dentry->iosched_lock);
				if (ret < 0) {
					ltfsmsg(LTFS_ERR, 13020E, dpr->dentry->platform_safe_name, ret);
					releasewrite_mrsw(&priv->lock);
					return ret;
				}
			}
		}

		if (! TAILQ_EMPTY(&priv->working_set)) {
			TAILQ_FOREACH_SAFE(dpr, &priv->working_set, working_set, aux) {
				ltfsmsg(LTFS_DEBUG, 13033D, "WS", dpr->dentry->platform_safe_name);
				ltfs_mutex_lock(&dpr->dentry->iosched_lock);
				ret = _unified_flush_unlocked(dpr->dentry, priv);
				ltfs_mutex_unlock(&dpr->dentry->iosched_lock);
				if (ret < 0) {
					ltfsmsg(LTFS_ERR, 13020E, dpr->dentry->platform_safe_name, ret);
					releasewrite_mrsw(&priv->lock);
					return ret;
				}
			}
		}
	}

	releasewrite_mrsw(&priv->lock);

@lucasvr
Copy link
Member

lucasvr commented Sep 22, 2020

@piste-jp-ibm, I'm now looking into reproducing the original behavior. Would you know if the file with the flipped extent needs to be a candidate for writing to the Index Partition? Also, I'm wondering if the periodic sync thread may be playing a role here..

@piste-jp-ibm
Copy link
Member Author

@lucasvr

It's a good point. The answer is yes. But I believe size of placement rules are small, like 1MiB, in most cases. So I can't see any problem around here.

But from logic point of view, we need to tape care about this.

@lucasvr
Copy link
Member

lucasvr commented Sep 23, 2020

That's really good to know, thanks! I will take a closer look at the interactions between the queues so I can understand the conditions which can lead to flipped extent entries.

@piste-jp-ibm piste-jp-ibm self-assigned this Jun 23, 2023
@piste-jp-ibm piste-jp-ibm added the to master Merge to master branch label Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to master Merge to master branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants