GEM: Optimized Framework for Engaging Multimodal Digital Content Generation

NU 2023-124

INVENTORS

Venkatramanan Subrahmanian* (McCormick School of Engineering, Computer Science)
Chongyang Gao
Natalia Denisenko
Soroush Vosoughi
Yiren Jian

SHORT DESCRIPTION

For digital advertisers and social media platforms, GEM generates engaging image-text pairs using an iterative, engagement-guided algorithm. It improves content coherence and boosts user engagement with streamlined automation.

BACKGROUND

Online advertising struggles with content that fails to capture user attention. Most current systems emphasize image realism or text fluency without addressing engagement. This gap limits campaign impact and elevates production costs.

ABSTRACT

GEM is a framework that produces engaging multimodal image-text pairs. It combines a pre-trained engagement discriminator with a method for continuous prompt learning using stable diffusion models. The system iteratively refines outputs by leveraging a language model and CLIP similarity to align image and text. Experimental results and human evaluations confirm enhanced engagement and alignment compared to existing methods. GEM offers a practical solution for creating captivating digital content.

MARKET OPPORTUNITY

The burgeoning market for generative AI in marketing and advertising is valued at $2.8 billion in 2025 and is projected to surge to $12.5 billion by 2030, expanding at a compound annual growth rate (CAGR) of 34.9%. This growth is fueled by the intense competition for consumer attention and rising customer acquisition costs within the $750 billion global digital advertising industry. The primary market includes digital marketing agencies and in-house brand teams who are under constant pressure to improve campaign performance and return on investment. (Source: Statista: "Digital Advertising - Worldwide").

DEVELOPMENT STAGE

TRL-4 – Prototype Validated in Lab: Key functions including engagement optimization and image-text alignment have been demonstrated in controlled experimental settings.

APPLICATIONS

Generate multimodal image-text ads from a given topic for online advertising.
Create engaging post content tailored for social media companies.
Produce datasets of potential phishing messages for training detection systems.

ADVANTAGES

Explicit engagement optimization: Prioritizes content that captures user attention.
Efficient continuous prompt learning: Reduces computational demand by freezing model parameters.
Iterative refinement for alignment: Enhances coherence between image and text outputs.
Open-vocabulary capability: Generates engaging content across diverse topics.

PUBLICATIONS

Venkatramanan Subrahmanian et al., GEM: Generating Engaging Multimodal Content, International Joint Conferences on Artificial Intelligence, 2024

IP STATUS
US Patent US20250111569A1 Pending

Direct Link:

https://canberra-ip.technologypublisher.com/tech?title=GEM%3a_Optimized_Frame work_for_Engaging_Multimodal_Digital_Content_Generation

Bookmark this page

Download as PDF

For Information, Contact:

Arjan Quist

Executive Director of Innovation Management

Northwestern University

847/467-0305

arjan.quist@northwestern.edu