BitBypass: Binary Word Substitution Defeats Multiple Guard Systems
BitBypass hides one sensitive word as a hyphen-separated bitstream, then uses system-prompt instructions to make the model decode and reinsert it. In testing across five frontier models, this approach substantially reduced refusal rates and bypassed multiple guard layers. All five tested models produced phishing content at rates between 68-92%. If your safety controls assume plain-language detection will catch malicious intent, this research deserves close attention.
Read the full article



