Security News > 2024 > March > Whizkids jimmy OpenAI, Google's closed models

Whizkids jimmy OpenAI, Google's closed models
2024-03-13 08:34

Boffins have managed to pry open closed AI services from OpenAI and Google with an attack that recovers an otherwise hidden portion of transformer models.

"We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix."

While the attack does not completely expose a model, the researchers say that it can reveal the model's final weight matrix - or its width, which is often related to the parameter count - and provides information about the model's capabilities that could inform further probing.

"What Google [et al.] did was reconstruct some parameters of the full model by querying it, like a user would. They were showing that you can reconstruct important aspects of the model without having access to the weights at all."

One of the recommendations of the report is "That the US government urgently explore approaches to restrict the open-access release or sale of advanced AI models above key thresholds of capability or total training compute." That includes "[enacting] adequate security measures to protect critical IP including model weights.

Asked about the Gladstone report's recommendations in light of Google's findings, Harris relied, "Basically, in order to execute attacks like these, you need - at least for now - to execute queries in patterns that may be detectable by the company that's serving the model, which is OpenAI in the case of GPT-4. We recommend tracking high level usage patterns, which should be done in a privacy-preserving way, in order to identify attempts to reconstruct model parameters using these approaches."


News URL

https://go.theregister.com/feed/www.theregister.com/2024/03/13/researchers_pry_open_closed_models/