The AI Field Guide / J

Letter J

1 term, explained without the techno-murk.

/

Jailbreak

Everyday

A prompt or technique intended to bypass an AI system's safety rules.

Jailbreaks exploit gaps between what a model can do and what its safeguards allow. Developers study them to improve safety, while malicious use can produce harmful content or actions.

For example

A user hides a disallowed request inside a role-play scenario to try to evade a filter.

#