Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose api for attaching and detaching Storage.localMarkHolder #5

Open
marcoferrer opened this issue Aug 23, 2019 · 4 comments
Open

Comments

@marcoferrer
Copy link

Hi there. I know this project is primarily targeting java, but I was interested in using it with Kotlins Coroutines (Fibers).

One thing holding me back is that suspending methods in kotlin can pause and resume on a different thread than the one it was invoked on.

suspend fun doSomething(){

    PerfMark.startTask("doSomething")
    try {
        // The current method is suspended while 'longRunningOperation' is executed in
        // another thread.
        longRunningOperation()

        // The rest of the methods execution resumes on which ever thread
        // happens pick up the task from a fork join pool
    } finally {
        // Will not work correctly due to the thread local no longer being present.
        PerfMark.stopTask("doSomething")
    }
}

The coroutines API does provide ways to handle this but they would require access to the
thread local.

suspend fun doSomething(){

    PerfMark.startTask("doSomething")
    // The thread local context element is takes care of attaching and detaching
    // values to the current thread before executing the next block of code
    withContext(Storage.localMarkHolder.asContextElement()) {
        try {
            longRunningOperation()
        } finally {
            // Code is resumed on a different thread, but the coroutine dispatcher
            // took care of ensuring our thread local is present before execution.
            PerfMark.stopTask("doSomething")
        }
    }
}

We ran into the same issue previously with io.grpc.Context and were able to leverage the attach() and detach() APIs to get maintain the correct context on the thread during coroutine execution. GrpcContextElement.kt

Would exposing similar methods unnecessarily complicate PerkMarks internals? Ive considered using linkIn but I dont think it was designed with this particular use-case in mind.

@carl-mastrangelo
Copy link
Member

@marcoferrer would using two PerfMark.event calls work using a tag? I know it's not as clean, but you could still make some sense of how long the "start" and "stop" region is.

Does kotlin expose some other unique identifier as opposed to ThreadId? It may be possible to record using a coroutine id, and stitch the results back together when reading out the marks.

@marcoferrer
Copy link
Author

Coroutine id is an internal value and only populated while running in debug mode. There is coroutine name and it is public and configurable.

suspend fun doSomething(){
    withContext(CoroutineName("someuniq-value")) {
        PerfMark.startTask("doSomething")
        try {
            longRunningOperation()
        } finally {
            // Code is resumed on a different thread, but the coroutine dispatcher
            // took care of ensuring our thread local is present before execution.
            PerfMark.stopTask("doSomething")
        }
    }
}

Coroutine name doe not enforce uniqueness though. Also its a widely used API, and its value is easily overwritten. This would make its usage painful.

I think a safer approach would be implementing a context element specific to perfmark.

suspend fun doSomething(){
    withContext(PerfMarkTaskId("someuniq-value")) {

        val taskId = coroutineContext[PerfMarkTaskId]
        //  PerfMark.setRecordId(taskId) ???
        PerfMark.startTask("doSomething")
        try {
            longRunningOperation()
        } finally {
            //  PerfMark.setRecordId(taskId) ???
            PerfMark.stopTask("doSomething")
        }
    }
}

We would even be able to provide a PerfMark API specific to coroutines and keep the perfmark context element private.

suspend fun doSomething(){
    
    perfMarkTask("doSomething"){
        longRunningOperation()
    }
}

inline suspend fun perfMarkTask(taskName: String, block: ()-> Unit){
    // coroutineContext is implicit in suspending functions
    // It is propagated via the compiler
    val currentTaskId = coroutineContext[PerfMarkTaskId]     
    
    val pTaskId = if (currentPTaskId != null) {
         // Link new sub task using currentTaskId ???
         // Create uid for this new task
         PerfMarkTaskId("something-uniqe")
    }   else {
         PerfMarkTaskId("something-uniqe")
    }
    
    withContext(pTaskId){
        try {
            PerfMark.startTaskInContext(coroutineContext, taskName)
            block()
        } finally {
            withContext(NonCancellable){
               PerfMark.stopTaskFromContext(taskName)
            }
       }
    }
}

Coroutines propagate their context down to their children. So if there is a parent task it will always be available provided uses follow best practices of coroutines structure concurrency.

Im not thoroughly familiar with perfmarks internals so I dont know how much of this makes sense but at least its an idea.

@carl-mastrangelo
Copy link
Member

@marcoferrer If io.perfmark.impl.Storage were pluggable, it could provide the correctly scoped (thread, coroutine, etc.) MarkHolder. The most serious issue with this is that there is a cost to providing this ability.

At the moment, getting the current threadlocal costs about 10-15ns (hash table look up), along with recording the values in the MarkHolder (3-6ns), and for most operations getting System.nanoTime (around 25-30ns). I think something that could work with the Kotlin context based approach you show would be using a ConcurrentHashMap instead of the threadlocal, and keyed on an id allocated upon entering the top level context. Some questions:

  1. Are you okay paying the more expensive ConcurrentHashMap lookup cost on each record? What is an acceptable amount of overhead? With CHM as storage, the risk of allocating hash table entries is much higher, making the performance more variable. (TLS allocates once per thread, which is likely lower).

  2. In a given Coroutine context tree, can multiple coroutines be actively writing to the same context? PerfMark storage is designed around a SPMC lossy queue. Multiple writers would break key assumptions and likely have a performance impact if modified.

  3. Allocating an ID in a coroutine context is fairly expensive if you want it to be unique. (PerfMark Links have this problem). A regular atomic integer is going to be a bottle neck when there are lots of threads. How would you work around the contention bottle neck?

  4. I am not super familiar with how Kotlin works, are there well defined edges at where a thread hop can happen? Knowing this would help guide the API design.

@carl-mastrangelo
Copy link
Member

I know this is very late after the the original request but I have come up with an API that I think allows Kotlin to enter and exit the coroutines. I would be interested in your feedback @marcoferrer with the changes made in https://github.com/perfmark/perfmark/releases/tag/v0.26.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants