The Conference for
Video Engineers

October 12th - 13th, 2022
San Francisco

Videos

Why attend?

No marketing, ever.

Speakers are selected based on their submission, not how much money their company paid; we will never, ever sell a speaking slot. Attendee information isn't for sale either, and that includes any sponsors.
Affordable.

We want anyone in the industry to be able to come, which means keeping tickets reasonably priced (thanks largely to our generous sponsors). We also offer free and discounted tickets to students and open source contributors, so please reach out if you're interested.
For everyone in the community.

Our community is dedicated to providing an inclusive, enjoyable experience for everyone in the video industry. In this pursuit, and in keeping with our love for reasonable standards, we adopted the Ada Initiative's code of conduct.

Venue and Location

Bespoke

845 Market St, Suite 450, San Francisco, CA 94103

Bespoke is a large, configurable, high-tech conference space right in the heart of downtown San Francisco. It is easily accessible via BART and Muni. Bespoke is located inside the Westfield San Francisco Centre mall on level 4, next to Bloomingdale’s Westfield San Francisco Centre.

Bespoke also played host to Demuxed in 2018, but we promise the chairs are much better this year!

Why Video Captioning Needs Built-In Viewer Feedback in 2022 (and How We Do It)

We've all heard the hype, excitement, and fear of how AI systems are getting smarter & smarter, developing sentience, and generally taking over the world creating a future of subjugation and despair for the human race. However, I am fairly confident that this bleak picture is not in our near future because there is one major problem: AI doesn’t even understand us that well.
Anyone that has used voice recognition on their phone or in their car will recognize that speech-to-text technology still has a long way to go. In the video world, this is nowhere more obvious than in auto-generated video captioning.

While auto-generated captions are better than no captions at all - incorrect spellings, wrong words, bad punctuation, and misplaced phrasing breaks among other discrepancies mean that human review and improvement is still needed for captions to accurately represent what is said and heard in videos. (If you’re watching a video for any long length of time and are not noticing any errors, that is thanks to human review!)

Accuracy in captioning is not a trivial matter since captioning errors are not just a minor annoyance. ADA accessibility compliance demands 99% accurate captions, speaker labels, and phrase breaks among other features that none of the auto-generated captioning services on the market today meet.
Yet, most auto-generated caption errors can be improved by far more people than only costly transcribers.

That’s why I propose that while the speech recognition wizards keep improving their methods and services, it is on us video engineers to allow interested viewers, those who are already watching and interested in fixing errors they see, the chance to easily give feedback to improve transcriptions of both recorded and live video. The goal of this is to increase accuracy and watchability for fellow viewers while also giving the machines better and better data to keep on improving.

In this talk, I will give a brief review of current speech-to-text technology, where it is limited, and why it will be limited until completely new techniques come along. Then I will outline both high-level ideas and actionable steps for video developers to add more feedback systems into their video players. This includes a demo that proposes updates to video player UIs for viewers to be able to easily give feedback, a backend that handles the inputs of an open-ended crowdsourced system in a productive manner, and updates to the caption file formats we use to capture this feedback effectively.

One day in the future, every video will be captioned 100% correctly by automation. But until that day, its on us to incorporate simple feedback systems so that every video has the chance to be captioned correctly!

Effective Per-Title Encoding For UGC Videos Using Machine Learning

Per-title encoding aims to achieve the best visual quality subject to a predefined maximum bitrate constraint for any arbitrary video content, which was first proposed by Netflix[1]. Ideally, the quality-bitrate convex hull of a given video should be obtained, by encoding the video with typical (bitrate, resolution) ladders and drawing the respective resulting quality-bitrate curves.

For UGC (User Generated Content) videos, it is not practical to obtain the convex hull of every single video as the volume of UGC to process usually is extraordinarily huge. Meanwhile, individual UGC videos have to be processed sufficiently fast. Hence, how to derive a per-title-like approach for UGC has become a challenging but fairly attractive research topic. Quite a few state-of-the-art approaches have been proposed, in particular featured by the use of machine learning.

In this talk, we first outline the typical UGC per-tile issue as below:
1. A set of (resolution, bitrate) ladders are predefined;
2. A maximum bitrate that indicates the real-time bandwidth constraint is specified;
3. It is needed to decide (a) which resolution to be chosen, and (b) which CRF value should be configured for an encoder, in order to have the encoding bitrate satisfy the maximum bitrate constraint while achieving the best possible visual quality.

We have practiced the following approach for UGC per-title-like encoding:
Step 1: Extract spatial / temporal features for a given UGC video.
Step 2: Pre-train a machine-learning model to map the extracted spatial/temporal features from Step 1 to the triplet (bitrate, VMAF, CRF) for all predefined resolutions.
Step 3: For a given maximum bitrate constraint, based on the predefined (bitrate, resolution) ladders, exploit the machine learning model to predict the chosen resolution and the encoder CRF parameter.

Using the above approach, we may effectively resolve the following during multi-rendition adaptive encoding/transcoding:
(1) The VMAF-bitrate curve usually will level off when bitrate increases to a certain level. Sometimes to achieve a too high VMAF score, an unnecessary large bitrate has been used. The VMAF score actually can be lowered to a certain extent while a much lower bitrate may be produced.
(2) The predefined (bitrate,resolution) ladders may be over-defined for a certain video category, which means too many ladders have been pre-defined For certain UGC video categories, some (bitrate, resolution) ladders may be removed in advance, which will help significantly speed up the per-title processing.

Overall, we will demonstrate that machine learning based per-title can be efficiently and effectively applied to UGC videos. It not only achieves a more ideal visual experience while at lower bitrate, but also can be processed at fairly low computational complexity.

Reference:
[1] Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca, "Per-Title Encode Optimization", Originally published at techblog.netflix.com on December 14, 2015.

Our Speakers

Alex Converse

Twitch
Alex Zambelli

Warner Bros. Discovery
Ali C. Begen

Ozyegin University / Comcast
Amy Rice

syd<video>
Christian Feldmann

Bitmovin
Christoph Guttandin

Media Codings
Cise Midoglu

Simula Research Laboratory
Dan Jenkins

Broadcaster VC
Dan Sparacio

Paramount
Derek Buitenhuis

Vimeo
Dylan Armajani

Akamai
Emil Santurio

Qualabs
Emmanuel Papirakis

Amazon
Guillaume Bichot

Broadpeak
Hadi Amirpour

Universität Klagenfurt
Hojatollah Yeganeh

SSIMWave
Jean-Baptiste Kempf

Videolan
Joey Parrish
Kieran Kunhya

Open Broadcast Systems
Leon Lyakovetsky

Podium
Lionel Bringuier

Videon
Marc Höppner

Akamai
Mario Guggenberger

Bitmovin
Mattias Buelens

THEO
Nicolas Levy

Qualabs
Nidhi Kulkarni

Mux
Peter Howard

Practical Applied Strategy
Peter Tseng

Eluvio
Qingyuan Liu

Eluvio
Steve Robertson

Google
Thomas Davies

Visionular
Tom Howe

Disney
Vanessa Pyne

Daily
Vittorio Giovara

Vimeo
Walker Griggs

Mux
Will Law

Akamai
Yuriy Reznik

Brightcove
Zoe Liu

Visionular

The Schedule

Wednesday 9:00 - 10:00 PDT

Breakfast
Wednesday 10:00 - 10:15 PDT

Opening Remarks

Matt McClure
Wednesday 10:15 - 10:40 PDT

Christoph Guttandin

Media Codings
Wednesday 10:40 - 10:55 PDT

Joey Parrish
Wednesday 10:55 - 11:20 PDT

Vanessa Pyne

Daily
Wednesday 11:20 - 11:35 PDT

Break
Wednesday 11:35 - 12:00 PDT

Dan Sparacio

Paramount
Wednesday 12:00 - 12:25 PDT

Zoe Liu

Visionular

Thomas Davies

Visionular
Wednesday 12:25 - 12:50 PDT

Tom Howe

Disney
Wednesday 12:50 - 13:50 PDT

Lunch, sponsored by:
Wednesday 13:50 - 14:05 PDT

Mattias Buelens

THEO
Wednesday 14:05 - 14:30 PDT

Guillaume Bichot

Broadpeak
Wednesday 14:30 - 14:45 PDT

Nicolas Levy

Qualabs

Emil Santurio

Qualabs
Wednesday 14:45 - 15:00 PDT

Hadi Amirpour

Universität Klagenfurt
Wednesday 15:00 - 15:15 PDT

Break
Wednesday 15:15 - 15:30 PDT

Marc Höppner

Akamai
Wednesday 15:30 - 15:55 PDT

Peter Howard

Practical Applied Strategy
Wednesday 15:55 - 16:10 PDT

Jean-Baptiste Kempf

Videolan
Wednesday 16:10 - 16:35 PDT

Mario Guggenberger

Bitmovin
Wednesday 16:35 - 16:50 PDT

Break
Wednesday 16:50 - 17:15 PDT

Ali C. Begen

Ozyegin University / Comcast
Wednesday 17:15 - 17:40 PDT

Peter Tseng

Eluvio

Qingyuan Liu

Eluvio
Wednesday 17:40 - 17:55 PDT

Hojatollah Yeganeh

SSIMWave
Wednesday 17:55 - 18:20 PDT

Will Law

Akamai

Thursday 9:00 - 10:00 PDT

Breakfast
Thursday 10:00 - 10:25 PDT

Alex Zambelli

Warner Bros. Discovery
Thursday 10:25 - 10:50 PDT

Emmanuel Papirakis

Amazon
Thursday 10:50 - 11:05 PDT

Vittorio Giovara

Vimeo
Thursday 11:05 - 11:20 PDT

Leon Lyakovetsky

Podium
Thursday 11:20 - 11:35 PDT

Break
Thursday 11:35 - 11:50 PDT

Nidhi Kulkarni

Mux
Thursday 11:50 - 12:05 PDT

Christian Feldmann

Bitmovin
Thursday 12:05 - 12:20 PDT

Kieran Kunhya

Open Broadcast Systems
Thursday 12:20 - 12:45 PDT

Cise Midoglu

Simula Research Laboratory
Thursday 12:45 - 13:45 PDT

Lunch, sponsored by:
Thursday 13:45 - 14:30 PDT

Lightning Talks
Thursday 14:30 - 14:55 PDT

Amy Rice

syd<video>
Thursday 14:55 - 15:10 PDT

Dylan Armajani

Akamai
Thursday 15:10 - 15:25 PDT

Break
Thursday 15:25 - 15:40 PDT

Yuriy Reznik

Brightcove
Thursday 15:40 - 16:05 PDT

Dan Jenkins

Broadcaster VC
Thursday 16:05 - 16:20 PDT

Alex Converse

Twitch
Thursday 16:20 - 16:35 PDT

Break
Thursday 16:35 - 16:50 PDT

Derek Buitenhuis

Vimeo
Thursday 16:50 - 17:05 PDT

Lionel Bringuier

Videon
Thursday 17:05 - 17:30 PDT

Walker Griggs

Mux
Thursday 17:30 - 17:55 PDT

Steve Robertson

Google
Thursday 17:55 - 18:15 PDT

Surprise
Thursday 18:15 - 18:20 PDT

Closing Remarks

Matt McClure
Thursday 18:30 - 20:30 PDT

Afterparty

About us

Demuxed is simply engineers talking about video technology. After years of chatting about video at the SF Video Technology Meetup, we decided it was time for an engineer-first event with quality technical talks about video. Our focus has traditionally been on content delivered over the web, but topics cover anything from encoding to playback and more!

Most of the organization and work behind the scenes is done by folks from Mux (Demuxed came first ☝️) but none of this would be possible without amazing people from the meetup.

Every year we get a group together that's kind enough to do things like schedule planning, help brainstorm cool swag, and, most importantly, argue heatedly over which talk submissions should make the final cut.

The Conference for Video Engineers

October 12th - 13th, 2022 San Francisco

Why attend?

No marketing, ever.

Affordable.

For everyone in the community.

Venue and Location

Bespoke

845 Market St, Suite 450, San Francisco, CA 94103

Our Speakers

Alex Converse

Twitch

Alex Zambelli

Warner Bros. Discovery

Ali C. Begen

Ozyegin University / Comcast

Amy Rice

syd<video>

Christian Feldmann

Bitmovin

Christoph Guttandin

Media Codings

Cise Midoglu

Simula Research Laboratory

Dan Jenkins

Broadcaster VC

Dan Sparacio

Paramount

Derek Buitenhuis

Vimeo

Dylan Armajani

Akamai

Emil Santurio

Qualabs

Emmanuel Papirakis

Amazon

Guillaume Bichot

Broadpeak

Hadi Amirpour

The Conference for
Video Engineers

October 12th - 13th, 2022
San Francisco