An innovative AI agent named A2, developed through collaboration between Nanjing University and the University of Sydney, is making notable advancements in Android application security by effectively locating and validating vulnerabilities. Building upon the foundation of its predecessor, the A1 project, which focused on smart contracts, A2 has demonstrated significant potential in its targeted field.
Greater Efficacy in Vulnerability Detection
A2 achieved remarkable results, with a 78.3% coverage on testing, surpassing the 30% coverage recorded by the static analyzer, APKHunt. This accomplishment was demonstrated on 169 real APKs, where A2 identified 104 zero-day vulnerabilities. Of these, 57 vulnerabilities were confirmed by automatically generated exploits. A medium-severity bug was detected in a popular app with over 10 million downloads, posing risks of malicious redirection and unauthorized control.
Unlike its predecessor, A2 introduces a unique validation module. Instead of employing a static verification scheme, A2 breaks down the verification process into specific tasks, confirming vulnerabilities incrementally. An example includes the detection of an AES key stored in plain text, where A2 extracted the key from strings.xml, employed it in generating a fake password reset token, and validated the token's ability to bypass authentication through automated checks at each step.
A2's Multi-Model Architecture
The architecture of A2 integrates an ensemble of large language models, including OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT-oss-120b, each fulfilling distinct roles like planning, executing, and validating processes. This distributed model assignment reflects human-style strategy-calculation confirmation, minimizing false positives while fortifying verified outcomes, contrasting with traditional tools often limited by accuracy.
Operational costs associated with A2 range from $0.0004 to $0.03 per app, contingent on the deployed models. Completing a full detection and verification cycle costs on average $1.77. Opting solely for Gemini 2.5 Pro increases this to $8.94 per identified bug, compared to GPT-4's capability of creating an exploit from a vulnerability at around $8.80.
While A2 surpasses Android static analyzers in efficiency, offering potential acceleration in both defensive and offensive cybersecurity research, some experts express concerns regarding vulnerability bounty programs that may not encompass all discovered flaws. The access to A2's source code is currently restricted to partnered researchers, maintaining a balance between accessibility and responsible disclosure.



