Enhancement for creating pretty extent at flush #211

piste-jp-ibm · 2020-09-11T06:58:06Z

Summary of changes

Change the implementation of flush() request to enqueue the remaining blocks to the writer queue. And make them processed sequentially.

Description

Previously, our flush operation might flip the final block when we get a small time window below.

LTFS receives a write and the unified scheduler completes one block and send it to the Q
LTFS receives another write
LTFS receives a flush
LTFS processes the block created in step2 within flush operation (because previous flush stops the writer thread and write blocks directly)
The writer thread wake up because of step1 and process the block queued in step1

In this change, all blocks are queued in flush operation and shall be processed by writer thread.

Do not wait the writer thread processes the queued block when flush is called against individual files
Wait the writer thread processes the queued block when flush is called against all files (like periodical sync request)

I believe this strategy improves the LTFS performance in normal condition (little bit make LTFS danger but enough safe, I believe). And LTFS makes a pretty extent in any time.

Type of change

Enhancement with long term test

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have confirmed my fix is effective or that my feature works

lucasvr

Hi, Abe-san! Thanks for your hard work on this one, as usual! Here are some comments about your changes.

I see that you've introduced writer_cond and writer_lock, and then use broadcast() to wake up those blocked on cond_wait(). There is an issue here, which is that it is possible for threads to receive spurious awake calls (that is, the condition may hold false even though it should not). I have been bitten by this problem before, and debugging this kind of problem can be a headache. The right way to test for wake up condition is to introduce a third variable (such as a boolean like bool writer_can_wakeup = true/false), and then rewrite flush_all so it looks like this:

    pthread_mutex_lock(writer_lock);
    while (! writer_can_wakeup)
        pthread_cond_wait(writer_cond, writer_lock);
    pthread_mutex_unlock(writer_lock);

You'll have to initialize writer_can_wakeup=false and make sure to set it to true prior to your call to broadcast().

Also, I noticed that the documentation of _unified_get_dentry_priv needs to be changed to reflect the new locks that must be held when calling that function.

Please pay special attention to my question regarding the removal of lock(d->iosched_lock) in unified_read. It looks to me like that like should not have been removed.

Best regards.

lucasvr · 2020-09-14T02:30:53Z

src/iosched/fcfs.c

 	struct fcfs_data *priv = (struct fcfs_data *) iosched_handle;

 	CHECK_ARG_NULL(path, -LTFS_NULL_ARG);
 	CHECK_ARG_NULL(dentry, -LTFS_NULL_ARG);
 	CHECK_ARG_NULL(iosched_handle, -LTFS_NULL_ARG);

-	return ltfs_fsraw_open(path, open_write, dentry, priv->vol);
+	ret = ltfs_fsraw_open(path, open_write, dentry, priv->vol);
+	if (!ret)


At least in the I/O schedulers we're used to write if (ret == 0) to check if a function call succeeded. I think it's good idea to use the same coding pattern, as ! ret is used to check if an operation failed.

Oops, if (!ret) is used in libltfs. So I think it is good to use. I'll use if (ret == 0).

I think Brian prefers to use if (!ret), good call, and if (ret < 0), bad call`.

Ah, I didn't remember we used that pattern in libltfs. No worries then -- this is a minor issue

Fixed in my branch.

lucasvr · 2020-09-14T02:32:13Z

src/iosched/unified.c

@@ -434,6 +477,9 @@ int unified_open(const char *path, bool open_write, struct dentry **dentry, void

 	ltfs_profiler_add_entry(priv->profiler, &priv->proflock, IOSCHED_REQ_ENTER(REQ_IOS_OPEN));
 	ret = ltfs_fsraw_open(path, open_write, dentry, ((struct unified_data *)iosched_handle)->vol);
+	if (!ret) {


Please see my previous suggestion: at least in the I/O schedulers we're used to write if (ret == 0) to check if a function call succeeded. I think it's good idea to use the same coding pattern, as ! ret is used to check if an operation failed.

Fixed in my branch.

lucasvr · 2020-09-14T02:32:41Z

src/iosched/unified.c

@@ -450,22 +496,26 @@ int unified_close(struct dentry *d, bool flush, void *iosched_handle)
 {
 	int write_error, ret = 0;
 	struct unified_data *priv = iosched_handle;
+	struct dentry_priv *dpr;


It's good idea to initialize this one to NULL, just in case

Fixed in my branch.

lucasvr · 2020-09-14T02:34:13Z

src/iosched/unified.c


 	CHECK_ARG_NULL(d, -LTFS_NULL_ARG);
 	CHECK_ARG_NULL(iosched_handle, -LTFS_NULL_ARG);
 	ltfs_profiler_add_entry(priv->profiler, &priv->proflock, IOSCHED_REQ_ENTER(REQ_IOS_CLOSE));

 	acquireread_mrsw(&priv->lock);
 	ltfs_mutex_lock(&d->iosched_lock);
+	ret = _unified_get_dentry_priv(d, &dpr, priv);


Here, the value of 'ret' is not checked (and it's potentially overwritten by the branch condition below)

lucasvr · 2020-09-14T02:36:00Z

src/iosched/unified.c

@@ -514,8 +564,7 @@ ssize_t unified_read(struct dentry *d, char *buf, size_t size, off_t offset, voi
 		goto out;
 	releaseread_mrsw(&priv->vol->lock);

-	ltfs_mutex_lock(&d->iosched_lock);


Are you sure you want to remove this? iosched_lock is needed to protect access to dpr->requests. If you are sure about removing it, then please note that you still have calls to ltfs_mutex_unlock() in this function.

No, it is my mistake. Fixed in my branch.

lucasvr · 2020-09-14T02:49:23Z

src/iosched/unified.c

+void _unified_put_dentry_priv(struct dentry_priv *dentry_priv, struct unified_data *priv)
+{
+	struct dentry_priv *dpr = dentry_priv;
+	struct dentry      *d   = dpr->dentry;
+
+	acquirewrite_mrsw(&d->meta_lock);
+	ltfs_mutex_lock(&dpr->ref_lock);
+	if (dpr->numhandles > 0) {
+		dpr->numhandles--;
+	}
+
+	if (!dpr->numhandles) {
+		d->iosched_priv = NULL;
+		ltfs_mutex_unlock(&dpr->ref_lock);
+		releasewrite_mrsw(&d->meta_lock);
+
+		if (! TAILQ_EMPTY(&dpr->requests))
+			ltfsmsg(LTFS_WARN, 13022W);
+
+		/* Sent alt_extentlist to libltfs */
+		if (dpr->write_ip && ! TAILQ_EMPTY(&dpr->alt_extentlist))
+			_unified_clear_alt_extentlist(true, dpr, priv);
+
+		ltfs_mutex_destroy(&dpr->write_error_lock);
+		ltfs_mutex_destroy(&dpr->ref_lock);
+		ltfs_mutex_destroy(&dpr->io_lock);
+		free(dpr);
+
+		ltfs_fsraw_put_dentry(d, priv->vol);
+
+		ltfsmsg(LTFS_DEBUG, 13028D, d->name.name);
+	} else {
+		ltfsmsg(LTFS_DEBUG3, 13029D, "Dec", d->name.name, dpr->numhandles);
+		ltfs_mutex_unlock(&dpr->ref_lock);
+		releasewrite_mrsw(&d->meta_lock);
+	}
+
+	return;
+}
+


The disposal logic looks good

lucasvr · 2020-09-14T02:49:51Z

src/iosched/unified.c

+
+	if (dpr) {
+		ltfsmsg(LTFS_DEBUG3, 13032D, req);
+		_unified_put_dentry_priv(req->dpr, priv);


You could use the 'dpr' alias here too

Fixed in my branch.

lucasvr · 2020-09-14T02:50:19Z

src/iosched/unified.c

@@ -1772,6 +1955,10 @@ ssize_t _unified_insert_new_request(const char *buf, off_t offset, size_t count,
 		releaseread_mrsw(&priv->lock);
 		return -LTFS_NO_MEMORY;
 	}
+
+	_unified_get_dentry_priv(d, &dpr, priv);


Does it matter if we get a NULL dpr here or not? We're not checking that; not sure if we should.

It's not needed actually. I will remove this initializer for now. Fixed in my branch.

But we may need to add some initializer in the future because some compilers reports a warning.

lucasvr · 2020-09-14T02:54:29Z

src/iosched/unified.c

+{
+	ssize_t ret = 0;
+	bool requeued = false;
+	struct dentry_priv *dpr;
+	struct write_request *req, *aux;
+
+	CHECK_ARG_NULL(d, -LTFS_NULL_ARG);
+	CHECK_ARG_NULL(priv, -LTFS_NULL_ARG);
+
+	_unified_get_dentry_priv(d, &dpr, priv);
+	if (! dpr) {
+		return 0;
+	}
+
+	/* Check for previous write errors */
+	ret = _unified_get_write_error(dpr);
+	if (ret < 0) {
+		_unified_put_dentry_priv(dpr, priv);
+		return ret;
+	}
+
+	if (TAILQ_EMPTY(&dpr->requests)) {
+		_unified_put_dentry_priv(dpr, priv);
+		return 0;
+	}
+
+	/* Enqueue requests to DP queue */
+	ltfs_thread_mutex_lock(&priv->queue_lock);
+	TAILQ_FOREACH_SAFE(req, &dpr->requests, list, aux) {
+		if (req->state == REQUEST_PARTIAL) {
+			if (dpr->in_working_set == 1) {
+				TAILQ_REMOVE(&priv->working_set, dpr, working_set);
+				--priv->ws_count;
+			}
+			if (dpr->in_working_set) {
+				--priv->ws_request_count;
+				--dpr->in_working_set;
+			}
+
+			req->state = REQUEST_DP;
+
+			if (! dpr->in_dp_queue) {
+				TAILQ_INSERT_TAIL(&priv->dp_queue, dpr, dp_queue);
+				++priv->dp_count;
+			}
+			if (! dpr->write_ip)
+				++priv->dp_request_count;
+			++dpr->in_dp_queue;
+
+			requeued = true;
+		}
+	}
+	ltfs_thread_mutex_unlock(&priv->queue_lock);
+
+	/* Tell background thread a write request is ready */
+	if (requeued)
+		ltfs_thread_cond_signal(&priv->queue_cond);
+
+	_unified_put_dentry_priv(dpr, priv);
+
+	return 0;
+}


The overall logic looks good

lucasvr · 2020-09-14T02:54:59Z

src/iosched/unified.c

+	} else {
+		if (empty)
+			empty = true;
+		else
+			empty = false;


This block below looks funny. Are you sure this is what you wanted to have here? You can probably delete it.

I would like to set bool empty only when both queues are empty.

empty flag dp_queue working_set

true EMPTY EMPTY

false NOT EMPTY EMPTY

false EMPTY NOT EMPTY

false NOT EMPTY NOT EMPTY

How about this ?

ltfs_thread_mutex_lock(&priv->writer_lock); acquirewrite_mrsw(&priv->lock); /* First of all, test both are empty */ if (TAILQ_EMPTY(&priv->dp_queue) && TAILQ_EMPTY(&priv->working_set)) { empty = true; } else { if (! TAILQ_EMPTY(&priv->dp_queue)) { TAILQ_FOREACH_SAFE(dpr, &priv->dp_queue, dp_queue, aux) { ltfsmsg(LTFS_DEBUG, 13033D, "DP", dpr->dentry->platform_safe_name); ltfs_mutex_lock(&dpr->dentry->iosched_lock); ret = _unified_flush_unlocked(dpr->dentry, priv); ltfs_mutex_unlock(&dpr->dentry->iosched_lock); if (ret < 0) { ltfsmsg(LTFS_ERR, 13020E, dpr->dentry->platform_safe_name, ret); releasewrite_mrsw(&priv->lock); return ret; } } } if (! TAILQ_EMPTY(&priv->working_set)) { TAILQ_FOREACH_SAFE(dpr, &priv->working_set, working_set, aux) { ltfsmsg(LTFS_DEBUG, 13033D, "WS", dpr->dentry->platform_safe_name); ltfs_mutex_lock(&dpr->dentry->iosched_lock); ret = _unified_flush_unlocked(dpr->dentry, priv); ltfs_mutex_unlock(&dpr->dentry->iosched_lock); if (ret < 0) { ltfsmsg(LTFS_ERR, 13020E, dpr->dentry->platform_safe_name, ret); releasewrite_mrsw(&priv->lock); return ret; } } } } releasewrite_mrsw(&priv->lock);

lucasvr · 2020-09-22T20:03:21Z

@piste-jp-ibm, I'm now looking into reproducing the original behavior. Would you know if the file with the flipped extent needs to be a candidate for writing to the Index Partition? Also, I'm wondering if the periodic sync thread may be playing a role here..

piste-jp-ibm · 2020-09-22T22:56:20Z

@lucasvr

It's a good point. The answer is yes. But I believe size of placement rules are small, like 1MiB, in most cases. So I can't see any problem around here.

But from logic point of view, we need to tape care about this.

lucasvr · 2020-09-23T01:25:42Z

That's really good to know, thanks! I will take a closer look at the interactions between the queues so I can understand the conditions which can lead to flipped extent entries.

piste-jp-ibm added 5 commits August 18, 2020 21:27

Introduce refcount to DPR

36b1fd7

Introduce defered flush

7826779

Support flush all correctly

10be046

Introduce request counter on profiler

5fce7dd

Enhance comments and messages

4c406a2

piste-jp-ibm requested a review from lucasvr September 11, 2020 06:58

lucasvr reviewed Sep 14, 2020

View reviewed changes

Refrect Lucas's comments

5afeed1

piste-jp-ibm self-assigned this Jun 23, 2023

piste-jp-ibm added the to master Merge to master branch label Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement for creating pretty extent at flush #211

Enhancement for creating pretty extent at flush #211

piste-jp-ibm commented Sep 11, 2020

lucasvr left a comment

lucasvr Sep 14, 2020

piste-jp-ibm Sep 14, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

lucasvr Sep 14, 2020

lucasvr Sep 14, 2020

piste-jp-ibm Dec 2, 2020

piste-jp-ibm Dec 2, 2020

lucasvr commented Sep 22, 2020

piste-jp-ibm commented Sep 22, 2020

lucasvr commented Sep 23, 2020

empty flag	dp_queue	working_set
true	EMPTY	EMPTY
false	NOT EMPTY	EMPTY
false	EMPTY	NOT EMPTY
false	NOT EMPTY	NOT EMPTY

Enhancement for creating pretty extent at flush #211

Are you sure you want to change the base?

Enhancement for creating pretty extent at flush #211

Conversation

piste-jp-ibm commented Sep 11, 2020

Summary of changes

Description

Type of change

Checklist:

lucasvr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucasvr commented Sep 22, 2020

piste-jp-ibm commented Sep 22, 2020

lucasvr commented Sep 23, 2020