Massive language fashions (LLMs) have turn into important instruments for organizations, with open weight fashions offering extra management and adaptability for customizing fashions to their particular use instances. Final 12 months, OpenAI launched its gpt-oss sequence, together with normal and, shortly after, safeguard variants, targeted on security classification duties. We determined to judge their uncooked safety posture in opposition to adversarial inputs—particularly, immediate injection and jailbreak methods that use procedures similar to context manipulation, and encoding to bypass security guardrails and elicit prohibited content material. We evaluated 4 gpt-oss configurations in a black-box setting: the 20b and 120b normal fashions together with the safeguard 20b and 120b counterparts.
Support authors and subscribe to content
This is premium stuff. Subscribe to read the entire article.

















