All rights reserved. ©2020 All rights reserved. ©2020 Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming ACM MMSys’21 Doctoral Symposium September 30, 2021 Ekrem Çetinkaya Christian Doppler Laboratory ATHENA | Alpen-Adria-Universität Klagenfurt | Austria ekrem.cetinkaya@aau.at | athena.itec.aau.at 1
All rights reserved. ©2020 ● Introduction ● Research Questions ● Methodology & Existing Results ● Ongoing & Future Work ● Q & A Agenda All rights reserved. ©2020 2
Introduction All rights reserved. ©2020 3
All rights reserved. ©2020 Video Streaming Share in the Internet Traffic 82% Content Characteristics 1 Million minutes Video Streamed Every Second As of 2021 * Cisco VNI Forecast Highlights (2021) All rights reserved. ©2020 4
All rights reserved. ©2020 HTTP Adaptive Streaming (HAS) Very Nice Video Play Play 5 240 kbps Client HAS Server 1200 kbps 3500 kbps 480 kbps 2500 kbps 7000 kbps
All rights reserved. ©2020 Video Encoding Block Partitioning Motion Compensation Transformation & Quantization Entropy Coding Entropy Decoding Inverse Transformation & Inverse Quantization Inter or Intra Prediction Picture Buffer In-loop Filtering 6
All rights reserved. ©2020 Video Codecs C. Feldmann, “State of Compression Standards - VVC”, 2020, https://bitmovin.com/compression-standards-vvc-2020/ Vanne et.al., “Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs”, TCSVT, 2012 High Efficiency Video Coding (HEVC) 2003 Advanced Video Coding (AVC) 2013 Versatile Video Coding (VVC) 2020 170 % 954 % 37 % 35 % Block size 16x16 Quaternary tree Supports up to 4K Block size 64x64 Quaternary tree Supports up to 8K Block size 128x128 Multi-type tree Supports up to 16K, 360° videos 7
All rights reserved. ©2020 Video Encoding with Machine Learning Block Partitioning Motion Compensation Transformation & Quantization Entropy Coding Entropy Decoding Inverse Transformation & Inverse Quantization Inter or Intra Prediction Picture Buffer In-loop Filtering Block Partitioning Decision Prediction Optical Flow Detection Mode Prediction Angular Direction Prediction Deblocking with ML Denoising with ML Super-resolution 8
Research Questions All rights reserved. ©2020 9
All rights reserved. ©2020 10 RQ-1 How to efficiently provide multi-rate representations over a wide range of resolutions for HAS? RQ-2 How to improve the performance of video codecs using machine learning? RQ-4 How to use machine learning to improve perceptual quality assessment for videos? RQ-3 How to improve the visual quality of videos using machine learning? Why? 🔋High-resolution content is getting more common, required number of representations for HAS is increasing. Literature 🗂 ML based approaches are utilized in video codecs to speed up encoder decision. 🗂 Some attempts in end-to-end ML based video codecs. Literature 🗂 ML based refinement techniques applied. 🗂 Post-processing in decoded frames to improve quality. 🗂 Super-resolution for images and videos. Literature 🗂 ML model is used in VMAF. 🗂 Several more attempts for non-reference perceptual quality assessment. Why? 🔋ML based image restoration methods are improving, however video is mostly ignored. QoE can be increased. Why? 🔋Finding a reliable metric for perceptual quality is important as current objective metrics are problematic. Why? 🔋More complex codecs, many possibilities to apply ML, still much room for improvement. Literature 🗂 Choose a reference representation and use its information to speed up remaining encodings.
Methodology & Existing Results All rights reserved. ©2020 11
All rights reserved. ©2020 Design and Abstraction Methodology Design Propose a solution (algorithm, concept, protocol, etc.) for a given problem Implement Prototype software implementation using the proposed solution Analyze Qualitative and quantitative analysis of the solution Repeat the cycle to improve the solutions 12
All rights reserved. ©2020 ● State-of-the-art: ○ Encode the highest quality 1 or the lowest quality 2 as the reference first then use these information ● Proposed Method 3 : ○ Encode the highest quality first, ○ Use its information to encode the lowest quality ○ Use information from both representations to encode the remaining representations ○ Double bound for CTU search ranges 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018. 3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358 QP1 QPN QPN-1 QP3 QP2 ... 13 Fast Multi-rate Encoding (DCC’20)
All rights reserved. ©2020 14 Fast Multi-rate Encoding (DCC’20)
All rights reserved. ©2020 ● State-of-the-art: ○ Encode the highest quality 1 or the lowest quality 2 as the reference first then use these information ● Proposed Method 3 : ○ Try different quality levels as the reference representation to determine the best starting point for parallel encoding ○ Encode the middle quality first and use its information. ○ Upper or lower bound depending on the quality level Towards Optimal Multirate Encoding (MMM’21) 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018. 3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Towards Optimal Multirate Encoding for HTTP Adaptive Streaming," The International MultiMedia Modeling Conference (MMM), Prague, Czech Republic, 2021 QPN/2 QPN QP2 QP1 ... 15
All rights reserved. ©2020 16 Towards Optimal Multirate Encoding (MMM’21)
All rights reserved. ©2020 ● State-of-the-art: ○ Use a CNN to predict CTU depth decisions 1 ● Proposed Method 2 : ○ Train a CNN with encoding information obtained from the reference representation and use its decision to encode dependent representations. ○ Focus on parallel encoding, thus only apply for bottleneck situations ○ Train different CNNs for different QP targets 1 Kim, Kyungah, and Won Woo Ro. "Fast CU depth decision for HEVC using neural networks." IEEE Transactions on Circuits and Systems for Video Technology 29.5 (2018): 1462-1473. 2 E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, “FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning,” 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, 2020, pp. 87-90. QPN CNN QPN-1 QP1 QP2 ... HEVC HEVC HEVC CNN HEVC HEVC 17 Fast Multi-rate Encoding with ML (VCIP’20)
All rights reserved. ©2020 18 Fast Multi-rate Encoding with ML (VCIP’20)
All rights reserved. ©2020 ● State-of-the-art: ○ Use the highest quality representation as the reference 1 ● Proposed Method 2 : ○ Train a CNN with encoding information obtained from the reference representation (the highest quality from the lowest resolution) and use its decision to encode dependent representations ○ Improves parallel encoding as well as serial encoding ○ Train different CNNs for different QP and resolution targets 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, "Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning," in IEEE Open Journal of Signal Processing, vol. 2, pp. 484-495, 2021, doi: 10.1109/OJSP.2021.3078657. 19 Fast Multi-rate and Multi-resolution Encoding with ML (IEEE OJ-SP) HEVC QP1 HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP1 CNN HEVC QP1 CNN 540p 540p 1080p 2160p
All rights reserved. ©2020 20 Fast Multi-rate and Multi-resolution Encoding with ML (IEEE OJ-SP) Normalized Encoding time HM 16.21 Lower Bound FaRes-ML
Ongoing & Future Work All rights reserved. ©2020 21
All rights reserved. ©2020 Work Plan 2019 Q4 2020 2021 2022 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 1. How to efficiently provide multi-bitrate representations over a wide range of resolutions for HAS ? 2. How to improve performance of video codecs using machine learning ? 3. How to improve quality of videos using machine learning ? 4. How to use machine learning to improve perceptual quality assessment for videos ? Literature review DCC’20 Paper MMM’21 Paper VCIP’20 Paper Multi-rate and Multi-resolution Encoding IEEE OJSP Paper RQ1 RQ2 RQ3 RQ4 Super-resolution Literature Review Perceptual Quality Assessment with ML 2023 Thesis Bitrate Ladder Prediction Literature Review Improvement in In-loop Filtering with ML Mobile Player Optimization with SR 22 Fast multi-rate encoding for adaptive http streaming Towards optimal multirate encoding for HTTP adaptive streaming FaME-ML: Fast multirate encoding for HTTP adaptive streaming using machine learning Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning
All rights reserved. ©2020 Thank you! ekrem.cetinkaya@aau.at @ekremcetinkaya_ linkedin.com/in/ekrcet

Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming

  • 1.
    All rights reserved.©2020 All rights reserved. ©2020 Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming ACM MMSys’21 Doctoral Symposium September 30, 2021 Ekrem Çetinkaya Christian Doppler Laboratory ATHENA | Alpen-Adria-Universität Klagenfurt | Austria ekrem.cetinkaya@aau.at | athena.itec.aau.at 1
  • 2.
    All rights reserved.©2020 ● Introduction ● Research Questions ● Methodology & Existing Results ● Ongoing & Future Work ● Q & A Agenda All rights reserved. ©2020 2
  • 3.
  • 4.
    All rights reserved.©2020 Video Streaming Share in the Internet Traffic 82% Content Characteristics 1 Million minutes Video Streamed Every Second As of 2021 * Cisco VNI Forecast Highlights (2021) All rights reserved. ©2020 4
  • 5.
    All rights reserved.©2020 HTTP Adaptive Streaming (HAS) Very Nice Video Play Play 5 240 kbps Client HAS Server 1200 kbps 3500 kbps 480 kbps 2500 kbps 7000 kbps
  • 6.
    All rights reserved.©2020 Video Encoding Block Partitioning Motion Compensation Transformation & Quantization Entropy Coding Entropy Decoding Inverse Transformation & Inverse Quantization Inter or Intra Prediction Picture Buffer In-loop Filtering 6
  • 7.
    All rights reserved.©2020 Video Codecs C. Feldmann, “State of Compression Standards - VVC”, 2020, https://bitmovin.com/compression-standards-vvc-2020/ Vanne et.al., “Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs”, TCSVT, 2012 High Efficiency Video Coding (HEVC) 2003 Advanced Video Coding (AVC) 2013 Versatile Video Coding (VVC) 2020 170 % 954 % 37 % 35 % Block size 16x16 Quaternary tree Supports up to 4K Block size 64x64 Quaternary tree Supports up to 8K Block size 128x128 Multi-type tree Supports up to 16K, 360° videos 7
  • 8.
    All rights reserved.©2020 Video Encoding with Machine Learning Block Partitioning Motion Compensation Transformation & Quantization Entropy Coding Entropy Decoding Inverse Transformation & Inverse Quantization Inter or Intra Prediction Picture Buffer In-loop Filtering Block Partitioning Decision Prediction Optical Flow Detection Mode Prediction Angular Direction Prediction Deblocking with ML Denoising with ML Super-resolution 8
  • 9.
  • 10.
    All rights reserved.©2020 10 RQ-1 How to efficiently provide multi-rate representations over a wide range of resolutions for HAS? RQ-2 How to improve the performance of video codecs using machine learning? RQ-4 How to use machine learning to improve perceptual quality assessment for videos? RQ-3 How to improve the visual quality of videos using machine learning? Why? 🔋High-resolution content is getting more common, required number of representations for HAS is increasing. Literature 🗂 ML based approaches are utilized in video codecs to speed up encoder decision. 🗂 Some attempts in end-to-end ML based video codecs. Literature 🗂 ML based refinement techniques applied. 🗂 Post-processing in decoded frames to improve quality. 🗂 Super-resolution for images and videos. Literature 🗂 ML model is used in VMAF. 🗂 Several more attempts for non-reference perceptual quality assessment. Why? 🔋ML based image restoration methods are improving, however video is mostly ignored. QoE can be increased. Why? 🔋Finding a reliable metric for perceptual quality is important as current objective metrics are problematic. Why? 🔋More complex codecs, many possibilities to apply ML, still much room for improvement. Literature 🗂 Choose a reference representation and use its information to speed up remaining encodings.
  • 11.
    Methodology & ExistingResults All rights reserved. ©2020 11
  • 12.
    All rights reserved.©2020 Design and Abstraction Methodology Design Propose a solution (algorithm, concept, protocol, etc.) for a given problem Implement Prototype software implementation using the proposed solution Analyze Qualitative and quantitative analysis of the solution Repeat the cycle to improve the solutions 12
  • 13.
    All rights reserved.©2020 ● State-of-the-art: ○ Encode the highest quality 1 or the lowest quality 2 as the reference first then use these information ● Proposed Method 3 : ○ Encode the highest quality first, ○ Use its information to encode the lowest quality ○ Use information from both representations to encode the remaining representations ○ Double bound for CTU search ranges 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018. 3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358 QP1 QPN QPN-1 QP3 QP2 ... 13 Fast Multi-rate Encoding (DCC’20)
  • 14.
    All rights reserved.©2020 14 Fast Multi-rate Encoding (DCC’20)
  • 15.
    All rights reserved.©2020 ● State-of-the-art: ○ Encode the highest quality 1 or the lowest quality 2 as the reference first then use these information ● Proposed Method 3 : ○ Try different quality levels as the reference representation to determine the best starting point for parallel encoding ○ Encode the middle quality first and use its information. ○ Upper or lower bound depending on the quality level Towards Optimal Multirate Encoding (MMM’21) 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018. 3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Towards Optimal Multirate Encoding for HTTP Adaptive Streaming," The International MultiMedia Modeling Conference (MMM), Prague, Czech Republic, 2021 QPN/2 QPN QP2 QP1 ... 15
  • 16.
    All rights reserved.©2020 16 Towards Optimal Multirate Encoding (MMM’21)
  • 17.
    All rights reserved.©2020 ● State-of-the-art: ○ Use a CNN to predict CTU depth decisions 1 ● Proposed Method 2 : ○ Train a CNN with encoding information obtained from the reference representation and use its decision to encode dependent representations. ○ Focus on parallel encoding, thus only apply for bottleneck situations ○ Train different CNNs for different QP targets 1 Kim, Kyungah, and Won Woo Ro. "Fast CU depth decision for HEVC using neural networks." IEEE Transactions on Circuits and Systems for Video Technology 29.5 (2018): 1462-1473. 2 E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, “FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning,” 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, 2020, pp. 87-90. QPN CNN QPN-1 QP1 QP2 ... HEVC HEVC HEVC CNN HEVC HEVC 17 Fast Multi-rate Encoding with ML (VCIP’20)
  • 18.
    All rights reserved.©2020 18 Fast Multi-rate Encoding with ML (VCIP’20)
  • 19.
    All rights reserved.©2020 ● State-of-the-art: ○ Use the highest quality representation as the reference 1 ● Proposed Method 2 : ○ Train a CNN with encoding information obtained from the reference representation (the highest quality from the lowest resolution) and use its decision to encode dependent representations ○ Improves parallel encoding as well as serial encoding ○ Train different CNNs for different QP and resolution targets 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143-157. 2 E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, "Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning," in IEEE Open Journal of Signal Processing, vol. 2, pp. 484-495, 2021, doi: 10.1109/OJSP.2021.3078657. 19 Fast Multi-rate and Multi-resolution Encoding with ML (IEEE OJ-SP) HEVC QP1 HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP2 CNN HEVC QPN CNN HEVC .. CNN HEVC QP1 CNN HEVC QP1 CNN 540p 540p 1080p 2160p
  • 20.
    All rights reserved.©2020 20 Fast Multi-rate and Multi-resolution Encoding with ML (IEEE OJ-SP) Normalized Encoding time HM 16.21 Lower Bound FaRes-ML
  • 21.
    Ongoing & FutureWork All rights reserved. ©2020 21
  • 22.
    All rights reserved.©2020 Work Plan 2019 Q4 2020 2021 2022 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 1. How to efficiently provide multi-bitrate representations over a wide range of resolutions for HAS ? 2. How to improve performance of video codecs using machine learning ? 3. How to improve quality of videos using machine learning ? 4. How to use machine learning to improve perceptual quality assessment for videos ? Literature review DCC’20 Paper MMM’21 Paper VCIP’20 Paper Multi-rate and Multi-resolution Encoding IEEE OJSP Paper RQ1 RQ2 RQ3 RQ4 Super-resolution Literature Review Perceptual Quality Assessment with ML 2023 Thesis Bitrate Ladder Prediction Literature Review Improvement in In-loop Filtering with ML Mobile Player Optimization with SR 22 Fast multi-rate encoding for adaptive http streaming Towards optimal multirate encoding for HTTP adaptive streaming FaME-ML: Fast multirate encoding for HTTP adaptive streaming using machine learning Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning
  • 23.
    All rights reserved.©2020 Thank you! ekrem.cetinkaya@aau.at @ekremcetinkaya_ linkedin.com/in/ekrcet