
In 2024, a product called Devin AI, was introduced by Cognition Labs as the “world’s first fully autonomous AI software engineer”. The authors claimed that Devin could independently plan, code, debug, and deploy entire applications based on natural language prompts — essentially, perform typical developer tasks. This announcement gained significant media attention, suggesting a future where AI could fully replace human software developers.
However, upon closer examination, it became evident that Devin’s capabilities were overstated. While the AI demonstrated some ability to generate code snippets, it struggled with even basic tasks, often producing errors or requiring substantial human intervention. Critics labeled it as “publicity hype”, highlighting the dangers of overstating AI’s current capabilities. What’s even funnier is that Devin job openings still list “software engineer” positions, proving that the product can’t do what it claims to be able to.
In software testing, particularly in test automation, there’s much marketing hype now too. Certainly, some AI-powered capabilities are already in production use today and deliver real value. Others are being explored in academic and industry research, with early results but limited adoption. And some widely spread ideas, like AI doing exploratory testing on its own — remain only in the imagination and pure speculation.
In the series of three articles, I want to explore the landscape of AI use in testing automation, covering what’s working, what’s emerging, and what’s still only hype.
What’s really working in AI Test Automation now
AI in software testing isn’t new, it has decades of research. Papers on defect prediction, test generation, and test suite optimization have been published since the 1970s. What’s changed is the speed at which research ideas now move into tools: AI research often makes its way into practice almost instantly, sometimes before it’s even formally peer-reviewed.
That said, not every research breakthrough leads to usable technology. Many techniques that perform well in lab conditions fall short in real-world QA environments, especially when data quality is poor or project constraints are messy. Fortunately, critical evaluations are published just as frequently as optimistic results, which makes it easier to cut through the noise.
This section focuses on what’s currently working in production: AI applications that have been widely adopted, deliver measurable value, and operate reliably across different teams and contexts.
Test case generation from requirements
One of the most widely adopted and reliable uses of AI in test automation today is generating test cases from requirements written in natural language. Tools that support this typically use off-the-shelf large language models (LLMs), custom NLP pipelines, and often add custom fine-tuned prompts to transform structured specifications (user stories, acceptance criteria, or functional descriptions) into structured manual test cases.
This is especially effective when requirements are well-structured and “deterministic”; while review is still needed, it accelerates documentation and reduces oversight.
Additionally, some tools (our AIDEN, for example) assist in expanding coverage by suggesting edge and negative test cases. These might include scenarios like:
- Leaving a password field empty
- Entering malformed input, or
- Combining actions in non-obvious ways
This helps diversify the test suite without requiring testers to brainstorm every corner case themselves.
AI-assisted test case conversion (manual to automation)
Another practical application of AI in testing is converting existing manual test cases into automated ones. This works best when the manual cases are clearly structured, follow repeatable steps, and describe deterministic behaviour. In such cases, AI can interpret the logic, translate the steps into executable scripts, and output runnable tests, often without requiring the tester to write any code.
Importantly, this isn’t just about turning text into code. Some tools, including AIDEN in Qase, take it a step further by pre-executing each step during conversion. This helps catch ambiguous instructions or invalid assumptions early, ensuring that the resulting automation is not only syntactically correct, but functionally valid in the target environment.
While human review is still important, especially for risky scenarios and complex flows, this kind of assisted conversion helps teams scale automation without scaling headcount, and without getting bogged down in boilerplate scripting.
Self-healing locators in UI automation
In UI automation, one of the most common sources of test failures is changes to element identifiers: button IDs, class names, or other structure changes within the DOM. Self-healing locators aim to reduce this brittleness by automatically detecting and adapting to such changes during test execution.
This feature is now production-ready and supported in many commercial tools. It works by using heuristics or lightweight machine learning to match a missing or broken element to a likely replacement based on other attributes, (e.g. visible text, position, tag structure, or role). If a match is confidently found, the test proceeds without failure.
This can significantly reduce test flakiness and maintenance overhead in regression suites, especially when changes to the UI are cosmetic or involve non-functional restructuring.
That said, self-healing isn’t a silver bullet. First, it works best when changes are minor and the UI context is stable. It can introduce false positives if the wrong element is matched and the test appears to pass.
Most mature teams use it alongside good locator strategies, alerts, or human-in-the-loop review for critical paths. As with many AI-driven features in testing, the value lies in reducing noise and repetitive rework, not in eliminating human oversight.
Log anomaly detection and signal extraction
AI-driven log analysis is already making debugging faster and more efficient — but only when used right. In the full whitepaper, I cover how AI helps surface meaningful signals from overwhelming log data and what pitfalls teams should avoid.
Test selection and prioritisation in CI/CD
Not all tests need to run on every commit. AI-powered test selection promises to save CI time, but it comes with its own challenges. Curious how this works in real teams? The whitepaper dives into practical use cases, tools, and when AI-driven prioritisation actually pays off.
Draft generation for test plans and documentation
AI can now help QA teams scaffold test plans, checklists, and reports — speeding up documentation without sacrificing quality. In the whitepaper, I explain how these tools work, their limitations, and where human judgement still matters most.
Takeaway: AI is already making testing better — when used right
The examples above are not theoretical. They’re in production today, helping teams automate faster, maintain more easily, and focus on higher-value work. But this is just one part of the picture.
In the next articles, I’ll cover:
- Emerging AI techniques that show promise (but aren’t quite ready)
- Common AI testing myths and why some ideas are still hype
If you want the full overview, with deeper analysis and references, download the full whitepaper here.