Computer scientists at Nanjing University and The University of Sydney have unveiled an advanced AI system, known as A2, designed to detect and validate vulnerabilities in Android applications. The system builds upon its predecessor, A1, by incorporating an automated validator, setting a new standard for accuracy and efficiency in vulnerability detection.
AI-driven Vulnerability Identification
A2 operates by mimicking human bug hunters, effectively discovering and validating flaws with a reported 78.3% benchmark coverage. This marks a significant improvement over existing static analyzers like APKHunt, which offers only 30.0% coverage. The system was applied to 169 production APKs and successfully identified 104 true-positive zero-day vulnerabilities. Notably, 57 of these vulnerabilities were self-validated using automatically generated proof-of-concept exploits. One particular flaw involved a medium-severity intent redirect in an app with over 10 million installs, illustrating the system's ability to uncover significant security risks.
Innovative Use of Large Language Models
The strength of A2 lies in its integration of multiple Large Language Models (LLMs), namely OpenAI o3, Gemini 2.5 Pro, Gemini 2.5 Flash, and GPT oss, each assuming specific roles such as planner, executor, and validator. The validator plays a crucial role by generating tests and verifying results, substantially reducing the incidence of false positives compared to traditional tools.
Case Study and Cost-effectiveness
The authors illustrate A2’s capabilities through a task involving a Ghera dataset where it automatically validates a three-step vulnerability exploit process. A2 is not only adept at handling diverse vulnerability classes but also demonstrates economic viability with detection-only costs ranging by model. Aggregated costs for full validation processes remain competitive, highlighting the system's potential in various application scenarios.
Implications for the Security Landscape
The researchers emphasize A2's potential to disrupt traditional bug bounty economics, with medium severity vulnerabilities potentially netting several hundred to several thousand dollars in bounty programs. However, they caution that bug bounties cover only a fraction of apps, possibly incentivizing malicious actors. Researchers anticipate a rise in both defensive measures and offensive cyber activities as a result.
Expert Insights and Future Directions
Adam Boynton from Jamf highlights the significance of AI-driven discoveries, noting how they streamline the process from detection to proof-based validation, minimize false positives, and enable faster remediation. While the source code and artifacts of A2 are currently restricted to facilitate responsible disclosure, they remain available to affiliated researchers with specific intentions.



