Search Results - eric+wong

1 Results Sort By:
SMOOTHLLM: Defending Large Language Models Against Jailbreaking Attacks
An algorithm designed to defend Large Language Models (LLMs) against jailbreaking attacks that significantly reduces attack success rates. Problem: Despite efforts to align LLM outputs with ethical standards, the models exhibit persistent weaknesses against adversarial attacks that bypass alignment mechanisms and safety guardrails, commonly known as...
Published: 1/28/2025   |   Inventor(s): George Pappas, Alexander Beck Robey, Seyed Hamed Hassani, Eric Wong
Keywords(s): Artificial Intelligence (AI) & Machine Learning, Software
Category(s): Technology Classifications > Business Ideas, Technology Classifications > Computer Information Systems