Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Tracking Extension (without stereo camera) #406

Open
akretz opened this issue Jun 4, 2021 · 19 comments
Open

3D Tracking Extension (without stereo camera) #406

akretz opened this issue Jun 4, 2021 · 19 comments

Comments

@akretz
Copy link

akretz commented Jun 4, 2021

Hello everyone,

I'm working on a related project and have the proposal to introduce 3D tracking functionality to future versions of OpenDataCam. Basically, the approach in our project is to calibrate a traffic surveillance camera and then use this information to produce moving 3D-boxes from a sequence of 2D-boxes. The idea is to initially fit a moving 3D-box into the 2D-boxes and then use a nonlinear Kalman filter to track the vehicle in subsequent frames, refining the initial estimate iteratively.

We have made our research available on arXiv. Specifically, the ideas that we describe in section III D & III E are the steps that need to be implemented to enable a 3D tracking mechanism. A 3D tracking extension would allow further use cases such as estimating the speed of vehicles or estimating their 3D trajectories in the real world as opposed to just on the 2D image plane.

Does this extension seem useful to anyone? I could try integrating that functionality into OpenDataCam, but it's been ages since I've written my last piece of JavaScript. If anyone is interested in integrating that functionality into OpenDataCam, feel free to let me know and I can try to give a hand.

@vsaw
Copy link
Collaborator

vsaw commented Jun 4, 2021

I am totally unfamiliar with this kind of technology so please excuse my naive questions but

  • does this work with 2d cameras or is any special equipment necessary
  • what is the advantage of having 3D tracking data over 2d tracking for traffic analysis?

Looking forward to you answers as 3D tracking in general has popped up quite a few time in ODC, so I'd guess there's some interest there 👍

@akretz
Copy link
Author

akretz commented Jun 4, 2021

@vsaw Our approach works with normal 2D cameras. The only extra information that is necessary is the camera matrix, which you can obtain by calibrating the camera. Using OpenCV for example, you can calibrate a camera by recording a chess board. No other special equipment is necessary.

The main advantage that 3D tracking offers you over 2D tracking is that you get the vehicles in real-world coordinates. That means you can do stuff like estimating their sizes and speeds or detect if they're doing an illegal maneuver.

@vsaw
Copy link
Collaborator

vsaw commented Jun 5, 2021

Thanks for the answer! Looking at your paper I believe that your 3D tracker will definitely be a good addition to ODC and can be integrated into OpenDataCam.

I'm not sure if you can implement this in the React Frontend (after the node moving things tracker already calculated trajectories and assigned object IDs to the raw Yolo output) or if you need to replace the Tracker and process the raw Yolo output.

In any case we'd be happy to guide you in the code base and review your implementation but we will most likely not have the capacity to actively contribute to the implementation.

@tdurand
Copy link
Member

tdurand commented Jun 8, 2021

@akretz this is super interesting, thanks for sharing

I share the views of @vsaw , we are working on releasing a v4 soon and we don't have the capacity to work on this.. but we can provide guidance..

May I ask besides a paper do you have a some piece of code to share implementing the paper that has a compatible license with OpenDataCam (ie: not GPLv3) ?? ( computer vision stuff is usually implemented in python..)

One question I have without reading the paper is, does this needs to process pixels (features) data to work.. or do you plug this directly on the output of YOLO (list of bbox per frame) .. ? If it needs to process pixels data this would be best implemented in C++ directly in darknet .. otherwise this could be implemented in https://github.com/opendatacam/node-moving-things-tracker in javascript .

There is another issue for 3D tracking but based on stereo cams.... which requires extra hardware etc etc... let's keep both issues and specify that ones is meant to work with special hardware, and this one with "normal" camera..

@tdurand tdurand changed the title 3D Tracking Extension 3D Tracking Extension (without stereo camera) Jun 8, 2021
@akretz
Copy link
Author

akretz commented Jun 8, 2021

@tdurand

May I ask besides a paper do you have a some piece of code to share implementing the paper that has a compatible license with OpenDataCam (ie: not GPLv3) ?? ( computer vision stuff is usually implemented in python..)

Unfortunately, the prototype we're using to implement the theory of our paper is not open source, so I cannot share it. I can, however, reimplement the parts that we've published to make them work in the OpenDataCam framework. In that case I'd use a permissive license so that won't be an issue.

One question I have without reading the paper is, does this needs to process pixels (features) data to work.. or do you plug this directly on the output of YOLO (list of bbox per frame) .. ? If it needs to process pixels data this would be best implemented in C++ directly in darknet .. otherwise this could be implemented in https://github.com/opendatacam/node-moving-things-tracker in javascript .

We are not using pixel information at all. The only information we're processing are the bounding boxes output by YOLO. So it might make sense to implement this in node-moving-things-tracker. I think it might also make sense to implement the 3D stuff in a completely new node module in the spirit of separation of concerns.

There is another issue for 3D tracking but based on stereo cams.... which requires extra hardware etc etc... let's keep both issues and specify that ones is meant to work with special hardware, and this one with "normal" camera..

I agree that it makes sense to keep both issues separate.

I'm going to give it a try and integrate it into OpenDataCam when I find the time. I'll let you know when I've finished a first implementation.

@rantgithub
Copy link

We came to a similar request a couple months ago and the implementation can be done with the current tracker.

Based on the paper, the 3D object will be inside of the original box. Performing some geometry calculations based on the original center point and size of the box this can be accomplished.

The issue that remains will be the performance, due to now we are not tracking only 1 point (the center) but multiple points to make the 3D objects based on the 2D. For a couple of objects will be fine, but when you have 50-100 objects on the feed/video, this puts a heavy load on the frontend. For this some backend modification will be needed.

Happy to work share some ideas on this as well. :)

@tdurand
Copy link
Member

tdurand commented Jun 8, 2021

@akretz awesome, let us know if you need any guidance.. yes maybe start with something separated from node-moving-things-tracker .. and maybe later integrate it and have two modes for it, with / without 3D tracking if this works.

@rantgithub I share you performance concerns.. but this wouldn't put any load on the front-end , just on the backend... will be for sure more computation intensive than the simple tracker.. but I guess if this does not use features (pixel data) we should be fine...

@akretz
Copy link
Author

akretz commented Jun 8, 2021

@rantgithub I don't think performance should be a big issue, because we're not tracking corner points separately. As you've said, we're just doing geometry based computations and assume that the road is a flat plane to find the best 3D-fit into 2D-boxes. After the initial fit, we use a Kalman filter for each object to refine our estimates, which is also just nothing more than some small matrix computations.

I haven't looked into the current tracker yet, but from my understanding it's basically associating boxes from subsequent frames based on their pairwise IOUs. That's precisely the same way we're doing it in our work. The tracker would stay the same, the 3D computations would just be something on top. I'm confident the performance bottleneck will still be the actual YOLO detections and not the 3D computations.

@tdurand
Copy link
Member

tdurand commented Jun 8, 2021

I haven't looked into the current tracker yet, but from my understanding it's basically associating boxes from subsequent frames based on their pairwise IOUs. That's precisely the same way we're doing it in our work. The tracker would stay the same, the 3D computations would just be something on top. I'm confident the performance bottleneck will still be the actual YOLO detections and not the 3D computations.

yep, the current tracker is doing just that.. for kalman filters in javascript you may want to have a look at this lib: https://github.com/wouterbulten/kalmanjs .. not sure how good it is.. but I had it bookmarked

@akretz
Copy link
Author

akretz commented Jun 8, 2021

@tdurand Thanks for the hint! However, it seems like that library supports 1D data only. I might just implement one myself, it's not super difficult. One important thing to keep in mind is to do the calculations in a numerically stable way, but I can just look into the computations of other well-known implementations such as FilterPy for that purpose.

@tdurand
Copy link
Member

tdurand commented Jun 8, 2021

yes, maybe there is a good kalman filter javascript libary out there... also if you don't have time to deep dive into javascript.. an implementation in python could be ported later on.. anyways.. keep in touch and thanks for sharing this !

@unrzn0
Copy link

unrzn0 commented Jun 9, 2021

Had a similar idea that with the current Opendatacam framework one could not only add one line per track to count, but to add two lines and at least get a vague approximation of speed by calculating the number of frames in between. The approach described above seems more sophisticated and superior, the simple approach seems, well, simple (and maybe enough for finding out how many cars are speeding).

@akretz
Copy link
Author

akretz commented Jun 18, 2021

@vsaw @tdurand I have started implementing the 3D tracker in a new npm package 3d-vehicles. Also, I've extended OpenDataCam with a new 3D view in my fork. Here's a screenshot of the current state of the 3D view. It draws the axes of the real world coordinate system around the origin and also the 3D states of the cars.

grafik

I'm using a simulated environment I generated in CARLA, so it's trivial to obtain the camera matrix in this case. In a real world scenario, you'd have to estimate the camera matrix by calibrating the camera and then estimating its pose in the real world. You can do that with OpenCV for example, they offer a nice Tutorial on how to accomplish that. After you've estimated the camera matrix, the same ideas that work in the simulated environment also apply in real world scenarios.

The 3d-vehicles package is a very early implementation and doesn't include the 3D Kalman filter yet. It just waits for a vehicle to move more than a certain threshold, estimates its orientation from that movement and then fits a 3D shape into the 2D detections. This is only a rough estimate; that's why, when you run it, you'll probably see a lot of jitter in the 3D shapes. The plan is to implement the Kalman filter in the next step to produce more accurate estimates.

@tdurand
Copy link
Member

tdurand commented Jun 21, 2021

Wow 🔥 , impressive progress !!

Let us know if you need any help.. I had a very quick look at the fork and sounds very clean and mergeable at some point with the main repo.

If you aim to integrate this to the core, make sure to:

  • branch from the development branch (I think you did)
  • add a 3D tracking boolean somewhere in the config file which would be off by default (as it only works for cars, am I right ?)
  • add some documentation (can be done after the technical work is done)

Regarding the camera calibration.. if I understand correctly there is no default calibration for a camera models (for example logitech 920 etc etc..) but rather you need to calibrate manually when you change camera ?

In that case we would need a good documentation on how to do this calibration.. then you get from the calibration step some parameters the user would put in the config file, am I understanding this correctly ?

@akretz
Copy link
Author

akretz commented Jun 21, 2021

Wow 🔥 , impressive progress !!

Thank you very much!

Let us know if you need any help.. I had a very quick look at the fork and sounds very clean and mergeable at some point with the main repo.

One thing that came up: is there any easy way to get the FPS of the input video stream? One important use case for 3D tracking is the estimation of vehicle speeds and for that it's crucial to know the FPS of the video. I know it's trivial to obtain the FPS of YOLO detections, but that number can fluctuate. Is there any easy way to do that? The only way I can think about is modifying the YOLO fork to output not only the resolution, but also the FPS. Is there perhaps any easier way that I didn't think about?

If you aim to integrate this to the core, make sure to:

* branch from the `development` branch (I think you did)

Yes, I did that. I guess the best way forward would be if I keep maintaining my fork and then merge it back into development some time after the v4 release.

* add a 3D tracking boolean somewhere in the config file which would be off by default (as it only works for cars, am I right ?)

Yes, it works with cars and other vehicles only.

* add some documentation (can be done after the technical work is done)

Sounds like a good idea!

Regarding the camera calibration.. if I understand correctly there is no default calibration for a camera models (for example logitech 920 etc etc..) but rather you need to calibrate manually when you change camera ?

In that case we would need a good documentation on how to do this calibration.. then you get from the calibration step some parameters the user would put in the config file, am I understanding this correctly ?

Estimating the camera matrix consists of two steps: First, you calibrate the camera and then you estimate its pose in the real world, after you mount it somewhere above a street. The first step you do once for your camera and then you're done. The second step you have to repeat every time you install the camera somewhere. For that it's necessary to get a mapping between 3D real world <-> pixel coordinates. For example, you could draw a 1m * 1m square on the road and then find the pixels of the corners of the square in your camera image.

It's relatively easy to do these steps with OpenCV, but I see that it might look a little intimidating for someone who isn't familiar with the terminology. That's why I guess it's a good idea if I write a documentation on how to do that. I'll see when I find the time to do that.

@tdurand
Copy link
Member

tdurand commented Jun 21, 2021

Yes, I did that. I guess the best way forward would be if I keep maintaining my fork and then merge it back into development some time after the v4 release.

definitely, let's do that after v4 release..

Estimating the camera matrix consists of two steps: First, you calibrate the camera and then you estimate its pose in the real world, after you mount it somewhere above a street. The first step you do once for your camera and then you're done. The second step you have to repeat every time you install the camera somewhere. For that it's necessary to get a mapping between 3D real world <-> pixel coordinates. For example, you could draw a 1m * 1m square on the road and then find the pixels of the corners of the square in your camera image.

It's relatively easy to do these steps with OpenCV, but I see that it might look a little intimidating for someone who isn't familiar with the terminology. That's why I guess it's a good idea if I write a documentation on how to do that. I'll see when I find the time to do that.

As this would be a very "niche" feature for now.. don't mind if it is complex.. just documented properly and maybe this will help some people solve specific use cases

@tdurand
Copy link
Member

tdurand commented Jun 21, 2021

One thing that came up: is there any easy way to get the FPS of the input video stream? One important use case for 3D tracking is the estimation of vehicle speeds and for that it's crucial to know the FPS of the video. I know it's trivial to obtain the FPS of YOLO detections, but that number can fluctuate. Is there any easy way to do that? The only way I can think about is modifying the YOLO fork to output not only the resolution, but also the FPS. Is there perhaps any easier way that I didn't think about?

For now you can read this variable to have the FPS infered from the yolo detections: https://github.com/opendatacam/opendatacam/blob/development/server/Opendatacam.js#L272

If you need something more accurate, adding something in the darknet fork would be the way to go yes I think

@vsaw
Copy link
Collaborator

vsaw commented Jun 24, 2021

@akretz Thanks for the code. The implementation looks straight forward and I like that you added a new 3D view.

As @tdurand already mentioned maintainability, from my side, I encourage you to look at the GPS tracker implementation and how it achieves to attach GPS metadata to each tracked item without needing to modify the Opendatacam.js file.

It basically is a sub-class of the node-moving-things tracker, overriding the getJSONOfTrackedItems method. I believe something similar can be done for your 3d tracker as well which will make maintainig things much easier for you and us.

@akretz
Copy link
Author

akretz commented Jun 24, 2021

@vsaw Good point, thanks for the hint! I'll see if I can refactor my code to follow the same pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants