The US government wants an AI capable of identifying the author of the text • The Register

The US intelligence community has launched a program to develop an artificial intelligence capable of determining the authorship of an anonymous writing while disguising an author’s identity by subtly altering their words.

The Intelligence Advanced Research Projects Activity’s (IARPA) Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) program aims to create software capable of performing “linguistic fingerprinting,” according to the Office of the Director of National Intelligence (ODNI). ). said.

“Humans and machines produce large amounts of textual content every day. Text contains linguistic features that can reveal the identity of the author,” IARPA said [PDF].

With the right model, IARPA believes it can identify consistencies in a writer’s style across different samples, modify those linguistic models to anonymize the writing, and do it all in a way explainable to novice users, ODNI said. HIATUS AIs should also be language independent.

“We have a strong chance of achieving our goals, providing much-needed capabilities to the intelligence community and significantly expanding our understanding of human language variation using the latest advances in computational linguistics and deep learning,” said HIATUS program director Dr. Timothy McKinnon. .

In order to develop robust models, HIATUS plans to approach its goals as a matter of adversarial AI: attribution of authorship and text anonymization are two sides of the same coin, and experimentation groups HIATUS will therefore be opposed to each other.

“Attribution systems are rated on their ability to match items by the same author across large collections, while privacy systems are rated on their ability to thwart attribution systems,” IARPA said.

The agency said it also plans to develop explainability standards for HIATUS AIs.

McKinnon said this part of what HIATUS does tries to demystify some of the unknowns around neural language models (central to HIATUS’ efforts), which it says work well but are essentially black boxes that work without their developers knowing why they make a particular decision.

Ideally, McKinnon said, “when we do copyright attribution or secrecy, we’re able to really understand why the system is behaving the way it is, and can verify that it’s not detecting no misleading stuff and that it works the right thing.”

If successful, HIATUS could have far-reaching impacts, ranging from countering foreign influence activities to identifying counterintelligence risks and protecting perpetrators whose work could put them at risk, said the ODNI. McKinnon adds that HIATUS AIs may also be able to identify if text is machine-generated rather than human-authored.

About 70% of IARPA’s completed research goes to other government partners for implementation, in which IARPA will not be involved — all it does is develop the technology, not transform it into something usable. That said, the odds are in favor of HIATUS, according to the intelligence agency.

Don’t expect this technology to appear in full form any time soon: now that HIATUS has started, it will be 42 months (three and a half years) until the experiment ends, and only then that other government agencies will likely be able to take HIATUS for a ride, if McKinnon and his team are successful. ®