Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISDK-2241: Co-Viewing example with custom AudioDevice #325

Open
wants to merge 97 commits into
base: master
Choose a base branch
from

Conversation

ceaglest
Copy link
Contributor

@ceaglest ceaglest commented Nov 8, 2018

Summary

This PR adds a new Co-Viewing example.

The sample app is mostly functional, but does not have completed UI and README.md. These items have been given separate tickets and will be completed after this PR (do we need a feature branch?).

The app is hardcoded with streamable content for presenters (see TODOs). Launching the app and tapping "Presenter" selects the hardcoded remote content URL. However, the app can also open the following document types:

public.mpeg-4
com.apple.quicktime-movie

If you encounter one of these types of videos anywhere on your iOS device (for example the Files app, or Dropbox), you can open them with Co-Viewing. This immediately connects to a Room as a Presenter.

Design

The Co-Viewing app has both Presenter and Viewer roles. A Viewer is generally a pretty standard Participant and shares their Camera and Microphone. A Presenter shares their camera, the video content, and an audio track with both microphone and player audio.

co-viewing architecture-5

co-viewing_ viewer architecture-2

Please refer to the internal design doc for more info. I will also add a more detailed diagram of the audio pipeline.

Limitations

The world of video playback is complex, and AVPlayer seamlessly handles a lot of different content types. Not all of these types are compatible with CoViewing for various reasons. For example, don't expect to be able to stream any FairPlay encrypted content.

The following table lists what kind of content I've tested.

Content Type Audio Tap Video Output Notes
Local (.mp4, .m4v, .mov) Content transforms are not handled for portrait video.
Progressive Stream (.mp4, .m4v, .mov)
HTTP Live Stream (.m3u8) MTAudioProcessingTap is prepared, but doesn't receive samples.
FairPlay Encrypted (remote) Untested, known not to work.
FairPlay Encrypted (local) Untested.
HDR10, Dolby Vision Untested. We request 8-bit 4:2:0 NV12 buffers.

About HLS

If you really want to stream HLS content, there might be a way. An interested developer could perform several steps to covert a live stream into a progressive download mp4.

  1. For a given HLS URI, replace https with a custom scheme appscheme.
  2. Provide a custom AVAssetResourceLoader.
  3. Parse the m3u8 playlist.
  4. If it is a master playlist, choose a quality level which is appropriate. Otherwise skip to step 6.
  5. Parse the m3u8 playlist for your selected quality level.
  6. Fetch the transport stream (.ts) or fragmented mp4 (.fmp4) segments in response to AVPlayer's requests.
  7. Demux and Remux to .mp4.

This could be done with heavy dependencies like ffpmeg (remuxing is a one liner in the cli), or lighter open source projects. You could also imagine an offline version of the same process, which downloads all of the segments and then assembles a playable .mp4 file.

However, I don't know how Apple's App Store reviewers would feel about such a technique. Ultimately, it would be great if they fixed the bug with AVPlayer and MTAudioProcessingTap, or provided sample code which demonstrates how to use it with HLS content.

TODO

  • Finish sample rate conversion for recording.
  • Decide on final mixing solution for recording.
  • Revisit test content.

ceaglest and others added 30 commits October 29, 2018 16:48
* Need to add KVO.
* Use AVPlayerItemVideoOutput, and CADisplayLink to pull frames.
* TODO - Set the MTAudioTapProcessor.
* Still need to figure out how to consume frames properly.
* TODO - Audio is heavily distorted.
* Add a rendering method for WebRTC audio.
* Hook up both audio tap and rendering inputs to the mixer.
* In this example we don't need any fixed size buffers or other pre-allocated resources. We will simply write
* directly to the AudioBufferList provided in the AudioUnit's rendering callback.
*/
return YES;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment needs to be updated.

import Foundation
import MediaToolbox

class ExampleAVPlayerAudioTap {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class can probably be removed, unless its useful to demonstrate why we didn't write the audio code in Swift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess lets remove it if it is not used.

attributes = [
kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
] as [String : Any]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explicitly add kCVPixelBufferIOSurfacePropertiesKey. I was doing it wrong before.

kCVPixelBufferIOSurfacePropertiesKey as String : [:]

platform :ios, '11.0'
project 'CoViewingExample.xcproject'

pod 'TPCircularBuffer', '~> 1.6'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to call it out, this PR is adding a dependency which is only used by the new example.

}
self.capturingContext->deviceContext = context;
self.capturingContext->maxFramesPerBuffer = _capturingFormat.framesPerBuffer;
self.capturingContext->audioBuffer = _captureBuffer;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

355-361 should probably be moved into initializeCapturer()

@ceaglest
Copy link
Contributor Author

ceaglest commented Nov 9, 2018

Okay, so sample rate conversion is working. The next step is to revisit recording mixing. In the current commit you just get the player audio in stereo due to an early return for testing.

well-thats-just-prime

Copy link
Contributor

@piyushtank piyushtank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the Playback side of changes. Reviewing the app and processing tap next, and then the capturing side. 😅

ALWAYS_SEARCH_USER_PATHS = NO;
CLANG_ANALYZER_NONNULL = YES;
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE;
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to change this in sample app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use c++14 to be safe.

ALWAYS_SEARCH_USER_PATHS = NO;
CLANG_ANALYZER_NONNULL = YES;
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE;
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE;
GCC_WARN_UNUSED_FUNCTION = YES;
GCC_WARN_UNUSED_VARIABLE = YES;
IPHONEOS_DEPLOYMENT_TARGET = 11.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this is all I have tested on, and can validate. We should talk about what versions we want to support in the example.

@class ExampleAVPlayerAudioDevice;

typedef struct ExampleAVPlayerAudioTapContext {
__weak ExampleAVPlayerAudioDevice *audioDevice;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id<TVIAudioDevice> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some sort of callback protocol here. If it was just TVIAudioDevice I wouldn't know what custom method to call.

@property (nonatomic, strong, nullable) TVIAudioFormat *renderingFormat;
@property (nonatomic, assign, readonly) BOOL wantsAudio;
@property (nonatomic, assign) BOOL wantsCapturing;
@property (nonatomic, assign) BOOL wantsRendering;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have documentation around wantsAudio, wantsCapturing and wantsRendering.

audioUnitDescription.componentFlags = 0;
audioUnitDescription.componentFlagsMask = 0;
return audioUnitDescription;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method being used anywhere?

if (status != noErr) {
NSLog(@"Could not set stream format on the input bus!");
AudioComponentInstanceDispose(_voiceProcessingIO);
_voiceProcessingIO = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we dispose mixer as well ? Probably we can have a strategy, setup mixer first, and if _voiceProcessingIO gets into an error, we can dispose mixer as well. WDYT?

AudioStreamBasicDescription playerFormatDescription = renderingFormatDescription;
if (self.renderingContext->playoutBuffer) {
playerFormatDescription.mSampleRate = self.audioTapContext->sourceFormat.mSampleRate;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment would be nice why are we adapting to the buffer (AVPlayer) format

CoViewingExample/ViewController.swift Outdated Show resolved Hide resolved
}
}

- (void)handleRouteChange:(NSNotification *)notification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you get chance to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, though I do see route change events being fired during normal operation (That don't restart).

AudioConverterReset(context->captureFormatConverter);
context->captureFormatConvertIsPrimed = NO;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has format conversion code as well. We will be using ExampleAVPlayerProcessingTap.m in AVAudioEngine audio device as well where the format conversion will not be required. Can we move the audio format conversion code outside this file? another option is we can have processing tap per audio device, in that case we will need to add a layer in folder structure inside AudioDevices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I noticed, audio capturing side stops working when app goes into the background. It resumes coming into the foreground.

// Adjust for what the format converter actually produced, in case it was different than what we asked for.
producerBufferList->mBuffers[0].mDataByteSize = ioPacketSize * bytesPerFrameOut;
// printf("Output was: %d packets / %d bytes. Consumed input packets: %d. Cached input packets: %d.\n",
// ioPacketSize, ioPacketSize * bytesPerFrameOut, context.sourcePackets, context.cachePackets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some commented code in this file but I guess you are aware of it and working on it.


if (status != kCVReturnSuccess) {
// TODO
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete context / cleanup ?

import Foundation
import MediaToolbox

class ExampleAVPlayerAudioTap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess lets remove it if it is not used.

if !output.hasNewPixelBuffer(forItemTime: itemTimestamp) {
// TODO: Consider suspending the timer and requesting a notification when media becomes available.
// print("No frame for host timestamp:", CACurrentMediaTime(), "\n",
// "Last presentation timestamp was:", lastPresentationTimestamp != nil ? lastPresentationTimestamp! : CMTime.zero)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi - this file has commented logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we should log once, suspend the timer and restart it later. This should be an easy fix. 👍

public var supportedFormats: [TVIVideoFormat] {
get {
let format = TVIVideoFormat()
format.dimensions = CMVideoDimensions(width: 640, height: 360)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const will be nice

import AVFoundation
import TwilioVideo

class ExampleAVPlayerSource: NSObject, TVIVideoCapturer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel documentation around scheduling, sample queue and logic would be nice.

attributes: DispatchQueue.Attributes(rawValue: 0),
autoreleaseFrequency: DispatchQueue.AutoreleaseFrequency.workItem,
target: nil)
super.init()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super.init() should be the first to call ?

Copy link
Contributor Author

@ceaglest ceaglest Nov 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-null sample queue needed to be set before initializing super or else the compiler yelled at me. I'm not an expert at initializing swift classes so this might be wrong.

// Configure access token either from server or manually.
// If the default wasn't changed, try fetching from server.
if (accessToken == "TWILIO_ACCESS_TOKEN") {
let urlStringWithRole = tokenUrl + "?identity=" + name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the rooms demo url specific logic from here

CoViewingExample/ViewController.swift Outdated Show resolved Hide resolved
@srinivasdr
Copy link

srinivasdr commented Nov 4, 2020

@ceaglest when I ran above example I see only presenter AVPlayer audio in audible in viewer side, but not presenter microphone audio. How to make both microphone and avplayer audio track audible/shared so that at viewer side we can listen both audio of microphone and avplayer of presneter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants