New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add slide segments and extracted text to harvesting and DB, enable paella slide previews #1163
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just code review, no testing yet)
Looks overall pretty good. I have a few small nits, but there is only one bigger one: about how to store the startTime
.
backend/src/db/migrations.rs
Outdated
32: "event-slide-text", | ||
32: "event-slide-data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the segments really "slide data"? Maybe just be verbose with event-slide-text-and-segments
?
alter table events | ||
-- The default above was just for all existing records. New records should | ||
-- require this to be set. | ||
alter column segments drop default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also add a not null
constraint here? I feel like we missed that in 14-event-captions
too? Like, we don't want the field to be null, right?
return { | ||
id: "frame_" + time, | ||
mimetype: "image/jpeg", | ||
time: time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time: time, | |
time, |
|
||
create type event_segment as ( | ||
uri text, | ||
start_time text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer it, if we stored this like duration
, namely as number of milliseconds as an int. The string takes more space and I don't think we can use it for anything. The only "consumer" of this number seems to be Paella, which wants seconds anyway. Converting from milliseconds to seconds is obv much easier than this string parsing. So that basically means moving ocTimeStringToSeconds
into the harvest code and converting before writing it to the DB. Not sure what we should do in case the parsing fails... probably just ignore all segments?
It would have been even better to already just send the milliseconds in the harvest API, but I didn't catch that when looking at the OC PR. Although... we could still fix it? It's a Tobira internal API and Tobira has not used that yet, so we can fix it without bumping the API version... mhhh
order: 102, | ||
tabIndex: 17, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are some high numbers... is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good call. I copied that from https://paellaplayer.upv.es/#/playground and forgot to adjust those numbers.
This pull request has conflicts ☹ |
This adds the ocr'd slide texts as well as a list of timestamped frames to the harvesting sync code and stores them in the DB.
In order the show the slide previews,
paella-slide-plugins
was added and configured to use the timestamped frames.Needs opencast/opencast#5757 to work. Once that is merged, released and used on our test Opencast, the changes can be tested with fresh uploads. We'll still need some mechanism to apply segmentation and ocr (and speech-to-text as well) to existing videos.
(Can be reviewed commit by commit, though note that the migration from the second commit was extended in the third)