At AppAchhi – we built a bot that tests apps. We are feeding it (programming) with as much intelligence as we can to make the bot smarter. The bot behaves, in most cases, the way we designed it to be. After our bot got to a stability point – we started to lay the foundation for adding the “Learnability” / Artificial Intelligence / Machine Learning layer to the bot. Through this post – I wanted to share our journey and my commentary around it that helps in understanding our journey towards A.I. Driven Functional Testing.

Functional Testing ==> y = H(f(x)); O(y);

In our quest towards A.I. for testing – we discovered a number of fundamental testing problems. The discovery helped us evolve our understanding of functional testing.

  • Functional testing looks like ->
    y = H(f(x));
  • f = function under test
  • x = input
  • y = output
  • H = Heuristic (simple terms – the test or an experiment)
  • O = observation and oracle (simple terms – expected result)

How do humans perform Functional Testing?

A human figures out f, x and y even if none of it is presented to the human at the start of testing other than the application to be tested.

The process of figuring out f, x, and y by a human is an outcome of talking to other humans who are building the product, engaging in email conversations with them, reading and writing documentation, brainstorming, critical thinking, building hypothesis, making assumptions, reading material online, bringing in past experience and equally important – by tacitness of working with smart people.  These things combine to help a human tester arrive at possibile experiments and tests (H) + expected results (O) that could help in finding functional bugs. The human tester refines the understanding of f, x and y through every iteration of testing.

In most cases, there is a steep learning curve based on the complexity and criticality of software and its application to the real world in a business context. This steep learning curve is a reason why a tester who has been in a project for long time is touted as someone who is valuable and will find more critical functional bugs, faster, than those who are relatively new.

How can A.I. do functional testing?

The H part

The bot could perform the Heuristic part (and a bot is not necessarily A.I.)

Our journey to build a testing bot started

  • AppAchhi bot version 0.1 was to make the bot work as a scanner. It scanned through the app and found paths it can traverse. After this – our version 0.2 had a simple programmed heuristic – “click every element recognized”
  • With version 0.2 – we started to find points in an app that crashes on clicks. Useful but you don’t expect to have such bugs with apps like Gmail or Facebook. So the market that would benefit from AppAchhi bot version 0.2 would be those who couldn’t afford to test and we thought they wouldn’t afford us anyway 🙂
  • We wanted to build a bot that applies heuristics helpful in finding problems in apps that are “considered” stable. The foundation for version 0.3 was by applying our learning from exploratory testing of top used apps tested in Moolya for its customers like Flipkart and PayTM. We re ran a few apps with the “exploratory” bot to see a huge difference in the crashes that were reported between v 0.3 and v0.2.

For instance – Instagram Android App – crashes intermittently – if you keep scrolling. Let’s say a memory leak or a buffer overflow is causing this issue. Testers learn from this and apply it to the next app they test which has a carousel.  The bot – can do that very well too. The next time the bot finds an app that has a carousel system – it can be trained to go after it to help us find a similar issue. These pointers – as my testing community friend Alan Richardson (@eviltester) suggested is Model Based Testing approach. There is no A.I. yet.

I was led to believe, as everyone else, who is not reading deep enough about A.I. that the bot doing some tests like this on its own is A.I. Later, I discovered, this is at best – what automation of the pre A.I. era was supposed to be.

The tough to solve “O” problem

If there was a way for humans to learn an oracle without understanding a feature, its purpose, the way it is designed, the choice of the programming language, the architecture and talking to people – humans should have identified it by now. I think we have identified these are the ways to understand a functionality and test it.

As I run Moolya – I have had the pleasure of listening to a lot of customer problems in testing and solving most of them. Every customer I spoke to who wanted to outsource their Functional Testing were concerned on how would our testers learn functionality of the app they were building. In testing parlance – they were asking “how does the tester learn what to input (H) and what to expect (O) ?”

The H can come from Model Based Testing driven bots but where does the O come from?

I think if we put an A.I. in front of the customer – they would evaluate it brutally because trusting a machine to learn functionality which today is such a complex activity without doing any interaction with humans becomes untrustable unless A.I. proves it wrong. The expectation from A.I. would be different from that of the human.

Is the best use of A.I. in testing to create an Exploratory Tester out of it and not SDET?

At AppAchhi we were thinking – if we start designing A.I. to talk to people to gather those oracles – what would happen?

All we would have ended up doing is to make A.I. a manual tester. I intentionally used the world manual tester here instead of exploratory because most of the people in the world love automation also because they think manual testing is less useful to scale. I think it is a worthwhile question to explore to see if A.I. in Testing should be a Manual (Exploratory) Tester instead of A.I. Automation Script Writer.

We understood that we were far away from building A.I. that determines Oracles on its own. Well, some Oracles are possible to br programmed using the Model Based Testing approach and a Comparable Oracle approach but we did that without needing to implement A.I. / M.L.

We then understood that before we build a Machine Learning layer that finds Oracle on its own – we needed to bridge the gap between the team where one of the tester is using an Oracle that is helpful to find bugs that others in the team could use. This helped us to start laying the foundation to build Bug Ninja – which is in its version 0.1 and the work for version 0.2 is underway.

A Bug Ninja records everything a tester does with an app. In dramatic terms – we want to make Bug Ninja the Iron Man suit and pass the intelligence of one Iron Man in the team to others.

Is A.I. Driven Functional Automation Testing possible?

I have been reviewing a lot of testing work lately from hundred plus testers of my own company and also I am engaged in a review of automation for a very large customer with very large team sizes in automation.

There are two kinds of SDET’s

  1. Type 1: Who rely on a manual (exploratory) tester to come up with tests – which the SDET’s then scripts the checks part of it.
  2. Type 2: Who have ability to understand the system, its flows and functioning – writes their own tests based on understanding and then writes the scripts.

The manual (exploratory) testers (which even the Type 2 SDET’s are) – depending on the exposure they have had – come up with tests that are functional in nature but varies in terms of depth. Most of the times they are shallow not because they did not understand the application under test but because they lacked the depth in coming up with those tests.

For instance – in my early experience with testing – most functional test cases I saw had expected result copy pasted from requirements document. You could get a test case document and reverse it to a requirement document. These days – there are either stories or acceptance criteria which is pretty much requirements. Oh and then there are implicit requirements. Oh and there are interpretations of the explicit requirements.

The training of A.I. 

If A.I. is all about training –  I am making a quick list of the sources from where A.I. can learn

  • Learn from manual (exploratory) testers testing an application
  • Learn from test cases written in the past
  • Learn from automation scripts
  • Learn from test reports
  • Learn from analytics
  • Learn from product code
  • Learn from App UI
  • Learn from all apps in the category

Our first step in implementing A.I. / M.L happened after we programmed our bot to learn from one of these sources. We could automate the automation till the point where it needed an assert statement to validate the core functionality. My friend Alan Richardson gave me examples that could act as Oracles for Heuristics generated out of such experiments that could help validate some portion of the functionality. Validating business cases may be difficult though.

The contextual oracle problem

A discount coupon for an e-commerce store is supposed to work / not work in some contexts.

  • A coupon is supposed to work – if it has not been used before
  • A coupon is supposed to work – if it is used within the expiry time of the coupon
  • A coupon is not supposed to work – if it has been used before
  • A coupon is not supposed to work – if it has crossed the expiry
  • A coupon is not supposed to work – for a certain product category
  • A coupon is not supposed to work – for a certain region
  • A coupon is not supposed to work – when certain products are in the cart not exceeding a certain price limit

For A.I. to learn these many different oracles and report saying – “This test is a pass with Oracles 4,5 and 6 but fails in others” is not what humans would be able to grasp and make decisions. That kind of reporting would lead to someone doing deeper investigation of every pass and fail – thereby increasing human activity and thereby “according to some people” – slower feedback cycles.

Bottom line questions in testing

  • Are we doing better test coverage?
  • Are we helping people make better decisions?

The way media projects A.I. is what runs in our mind most of the times – something that can replace humans.

At AppAchhi – we do a lot of intelligent testing (resulting in deeper test coverage) than we did in our past lives and every improvement to test coverage that is happening through AppAchhi right now isn’t because of A.I. coming into picture. It is because we are discovering a whole bunch of problems that have not yet been solved before we can even implement A.I.

For instance at AppAchhi – we gave our customers a way to test the app in Low RAM conditions that helped in finding performance risks that could only be found in Low RAM conditions. While this is cool in improving test coverage – this is not A.I.

We are discovering a lot of ways to improve test coverage and asking ourselves – do we need A.I. to solve this problem and most of the times the answer is a big NO. It only tells me that we have let plenty of problems remain and we are using A.I. as an excuse to wake up to the reality of the testing problem.

I think we are using A.I. as an excuse to wake up and solve testing problems that need to be solved.

Customers will buy what solves their functional testing problems and helps them scale the business. They are obsessed to having their problem solved in the most efficient manner. With or without A.I.