Tutorials - IEEE International Conference on Multimedia & Expo 2025

Tutorials:

Tensor Learning for Multimedia Processing

Tutorial speakers:

Yipeng Liu, University of Electronic Science and Technology of China (UESTC), Chengdu, China. Email: yipengliu@uestc.edu.cn
Weiyao Lin, Shanghai Jiao Tong University, Shanghai, China. Email: wylin@sjtu.edu.cn

Tutorial description:

Multimedia signal is often multidimensional, and traditional machine learning typically represents and processes data using vectors and matrices, which requires unfolding multidimensional signals for computation. However, this vectorization or matricization process often discards the inherent multilinear structure of the data, resulting in suboptimal performance. Tensors, as high-dimensional generalizations of vectors and matrices, naturally capture such data, preserving its intrinsic structure. This tutorial starts with an introduction to tensor computations, followed by a classification of tensor learning methods into three main categories: tensor regression, tensor classification, and deep tensor networks. Practical applications in multimedia data processing are explored through concrete examples. Finally, strategies for efficient implementation are discussed, along with insights into potential future research directions. By the end of the tutorial, participants will have gained a solid understanding of the importance of tensor computation, how to implement it, the types of multimedia data tasks it can address, and the associated benefits and challenges.

Video Coding Advancements in HTTP Adaptive Streaming

Tutorial speakers:

Hadi Amirpour (University of Klagenfurt)
Christian Timmerer (University of Klagenfurt)

Tutorial description:

This tutorial provides a comprehensive exploration of the HTTP Adaptive Streaming (HAS) pipeline, covering advancements from content provisioning to content consumption. We begin by tracing the history of video streaming and the evolution of video coding technologies. Attendees will gain insights into the timeline of significant developments, from early proprietary solutions to modern adaptive streaming standards like HAS. A comparative analysis of video codecs is presented, highlighting milestones such as H.264, HEVC, and the latest standard, Versatile Video Coding (VVC), emphasizing their efficiency, adoption, and impact on streaming technologies. Additionally, new trends in video coding, including AI-based coding solutions, will be covered, showcasing their potential to transform video compression and streaming workflows.

Building on this foundation, we explore per-title encoding techniques, which dynamically tailor bitrate ladders to the specific characteristics of video content. These methods account for factors such as spatial resolution, frame rate, device compatibility, and energy efficiency, optimizing both Quality of Experience (QoE) and environmental sustainability. Next, we highlight cutting-edge advancements in live streaming, including novel approaches to optimizing bitrate ladders without introducing latency. Fast multi-rate encoding methods are also presented, showcasing how they significantly reduce encoding times and computational costs, effectively addressing scalability challenges for streaming providers.

The tutorial further delves into edge computing capabilities for video transcoding, emphasizing how edge-based architectures can streamline the processing and delivery of streaming content. These approaches reduce latency and enable efficient resource utilization, particularly in live and interactive streaming scenarios.

Finally, we discuss the QoE parameters that influence both streaming and coding pipelines, providing a holistic view of how QoE considerations guide decisions in codec selection, bitrate optimization, and delivery strategies. By combining historical context, theoretical foundations, and practical insights, this tutorial equips attendees with the knowledge to navigate and address the evolving challenges in video streaming applications.

Bio-Inspired Approaches to Video Coding and Compression

Tutorial speakers:

Effrosyni Doutsi, Foundation for Research and Technology – Hellas
Grigorios Tsagkatakis, Prof. Grigorios Tsagkatakis

Tutorial description:

The rapid evolution of video technology has driven the demand for efficient, scalable, and perceptually optimized solutions for video coding and compression. Inspired by the remarkable efficiency of biological systems, particularly the human visual system, this tutorial explores bio-inspired approaches to video coding and quality assessment. This tutorial will introduce participants to key concepts in bio-inspired video processing, including neuro-inspired compression techniques, perceptually driven rate-distortion optimization, and video quality assessment methods grounded in models of human perception. The tutorial aims to bridge the gap between biology and state-of-the-art video technology, showcasing applications in streaming, autonomous systems, and low resource environments.

Semantic Communication for Media Compression and Transmission in Next Generation Communications

Tutorial speakers:

Anil Fernando, Head of the Multimedia Communications Research Group, University of Strathclyde, UK

Tutorial description:

Semantic communication, a concept first discussed by Shannon and Weaver in 1949, classifies communication challenges into three distinct levels: physical, semantic, and effectiveness. The physical problem concerns the accurate and reliable transmission of the raw data content of a message, which led to the development of information theory—a field that has profoundly influenced modern communication technologies. The semantic problem, in contrast, deals with ensuring that the intended meaning or context of a message is accurately delivered to the receiver. Finally, the effectiveness problem focuses on determining whether the message achieves its intended purpose or prompts the desired action from the recipient. While advancements in physical communications have progressed exponentially since the early days of information theory, laying the groundwork for today’s high-performance gaming, entertainment, and media ecosystems, semantic communication has remained underexplored for decades. This stagnation can largely be attributed to the absence of computational and theoretical tools required to implement semantic communication systems effectively. Recent advancements in deep learning, natural language processing (NLP), and computational performance have made it possible to revisit semantic communication as a practical and transformative paradigm. Unlike traditional communication methods that prioritize transmitting raw data with high fidelity, semantic communication focuses on delivering meaning, intent, or relevance while minimizing unnecessary data redundancy. This shift is particularly relevant for addressing modern challenges, such as the growing demand for bandwidth-intensive applications, low-latency connectivity, and efficient energy use in data transmission. Semantic communication enables intelligent and context-aware transmission, making it a promising solution to improve the capacity, scalability, and reliability of current and future communication systems. In summary, semantic communication is revolutionizing media compression and transmission by emphasizing meaning over raw data. This paradigm aligns well with the challenges of next-generation networks, such as 5G, 6G, and IoT, which require solutions for bandwidth optimization, latency reduction, and scalability. Its applications span a wide range of fields, including entertainment, gaming, smart devices, and autonomous systems, making it a critical component of future communication systems.

In this Tutorial, we delve into how semantic communication concepts can complement conventional multimedia communication systems, with a focus on image and video compression and transmission. Our early experiments and results in this field are highly promising, demonstrating that semantic communication can achieve better-quality reconstructions of images and videos for a given bandwidth compared to state-of-the-art compression techniques like HEIL/JPEG, H.264/H.265/H.266, and AV1/AV2. This improvement is achieved by selectively encoding and transmitting semantically relevant features rather than raw pixel data, effectively optimizing resource utilization. However, significant challenges remain before semantic communication can be widely adopted in commercial applications. These challenges include the development of robust and generalizable semantic models, ensuring compatibility with existing infrastructure, addressing computational complexities, and safeguarding data privacy and security. Additionally, standardized frameworks and protocols for semantic communication are needed to facilitate widespread deployment. We present an overview of the historical background, current state of research, and a future roadmap for leveraging semantic communication in multimedia compression and transmission. By addressing these challenges and exploring its potential, semantic communication is poised to become a cornerstone technology in transforming the way multimedia data is encoded, transmitted, and consumed.

Facial Micro-Expression Analysis: Towards Multimodality and New Challenges

Tutorial speakers:

John See, Associate Professor, Heriot-Watt University (Malaysia)
Jingting Li, Associate Researcher, Institute of Psychology, Chinese Academy of Sciences, China
Xinqi Fan, Assistant Professor, Manchester Metropolitan University, United Kingdom

Tutorial description:

Facial micro-expression (ME) analysis has emerged as a vibrant interdisciplinary research area, attracting psychologists, neuroscientists, and computer scientists due to its potential applications in clinical diagnosis, forensic investigations, and human behavior understanding. MEs are fleeting, involuntary facial expressions, lasting only 1/25 to 1/5 of a second, which reveal concealed emotions. Advances in computational algorithms, video acquisition technology, and increasing dataset availability have propelled this field forward, with recent years witnessing a surge in research and participation, notably through the last four Micro-Expression Grand Challenges and Workshops held at ACM Multimedia. This tutorial offers a comprehensive overview of ME analysis, exploring its background, databases, and conventional tasks like spotting and recognition, which have thrived on traditional and deep learning-based approaches. Delving into cutting-edge topics, this tutorial highlights innovations in self-supervised learning, ME generation, multimodal learning, and the new paradigms of spot-then-recognize and “in-the-wild” analysis. Psychological aspects as well as the interplay between macro- and micro-expressions are essential in recognizing real-world challenges. By addressing key challenges and discussing future directions, this tutorial equips participants with a deeper understanding of the field’s current landscape and its promising frontiers in multimedia research.

Generative Face Video Coding and Avatars: Selected Recent MPEG Efforts in Human Representation Standards

Tutorial speakers:

Anthony Trioux, Multimedia Communication Laboratory, School of Telecommunications Engineering, Xidian University, Xi’an, China
Shiqi Wang, 2Department of Computer Science, City University of Hong Kong, Hong Kong, China
(to be determined)

Tutorial description:

This tutorial explores selected recent MPEG efforts in human representation standards through two emerging technologies: Generative Face Video Coding (GFVC) and avatar representation. Both fields address the challenges of creating high-quality, realistic representations of human appearance and behavior, leveraging innovative techniques for enhanced multimedia experiences.

The first part of the tutorial focuses on GFVC, a novel approach employing generative models to achieve ultra-low bitrate video compression while maintaining adequate visual quality. We will first introduce the main GFVC architectures currently under investigation in MPEG, and then present ongoing standardization activities aimed at shaping the future of generative-based face video coding.

The second part shifts to avatar-based representations, emphasizing their importance in immersive and interactive applications. Attendees will gain insights into key components such as blendshapes, body joints, and animation controllers, as well as the MPEG MORGAN representation for interoperability. Additionally, we will also examine recent innovations in animation stream compression, addressing their importance for real-time and scalable deployment.

Designed for researchers, engineers, and students with interests in multimedia processing, generative models, and standardization activities, this tutorial provides a comprehensive overview of MPEG’s latest contributions in those fields. By bridging theoretical concepts and practical applications, attendees will gain a comprehensive understanding of these two technologies and their role in advancing human representation standards.