Introduction to Calabash

PDF for offline use:
Related Articles:
Related Links:

Let us know how you feel about this.

Thanks for the feedback!

last updated: 2016-10

This guide introduces the Calabash Framework, an Automated UI Acceptance Testing framework that allows you to write and execute tests that validate the functionality of iOS and Android Apps. It also introduces the concept of Behavior Driven Development and explains how to configure iOS and Android applications to be able to use Calabash in them.


Acceptance testing is an engineering practice that is used to prove that an application will work according to predefined specifications. Acceptance testing is also be used to prove that changes to an application have not unintentionally broken any of the existing functionality. This involves running a suite of high-level tests on an application. Traditionally, acceptance testing has been a manual process that involves testers running all of the test scenarios by hand against the application. Manual testing is time consuming and expensive, and deters many teams.

Calabash is one framework that enables automated UI Acceptance Tests written in Cucumber to be run on iOS and Android applications. While Calabash integrates tightly with Xamarin.iOS and Xamarin.Android project, it can also be used with iOS and Android projects written in the indigenous languages of Objective-C and Java.

Acceptance tests are not written in isolation. Good tests require that developers and subject matter experts get together and discuss how the application should work. These discussions allow the developer to capture the behavior of the software in a simple, business-readable text file. The developer then writes the code so that these written tests pass. This style of software development is known as behavior driven development (BDD).

Behavior Driven Development (BDD) is a philosophy of Outside-in Development in which the application code is written after it’s externalities have been defined. It’s conceptually similar to Test-Driven Development (and is in fact based on it), but takes it one step further, in that instead of creating tests that describe the shape of APIs, application behaviors are specified.

The essential idea here isn’t very different from the practice of creating functional specifications for software and then building to the spec, but it takes functional specifications and makes them Executable, by providing a means in which to test the actual specifications. And just as typical with specification creation, BDD is meant to be a process in which multiple stakeholders weigh in to create a common understanding of what is to be built. The benefit of the BDD approach is that it helps ensure that the right software is built (as opposed to the software being built right). The intent is that the software is designed from the perspective of the business owner. That way, there is little interpretation required as to how the software should behave.

There are three features that make BDD a powerful technique:

  • The communication between developers and subject matter experts results in an ubiquitous language that minimizes misunderstandings as the application is developed.
  • The acceptance tests provide clear and simple documentation about how the application should work.
  • The application has tests that will notify developers if any future changes to the application break existing functionality. This is especially powerful when combined as part of a continuous integration workflow that will build the app and run the tests each time a developer checks in code.

At a high level, the general workflow for testing a mobile app follows these steps:

  1. Write the feature – As described in the overview, the developer and a subject matter expert work together to write the Calabash feature. They create a text file (with the extension of .feature)that describes how the scenarios should work.
  2. Run the feature – Next, the developer runs the feature that was created. It will fail because the steps have not yet been defined. However, Cucumber will help us out by providing some code snippets that we can use to create the step definitions.
  3. Create the Step Definitions – Using the snippets that Cucumber provided in step #2, we create a Ruby source code file, and paste the snippet output into it.
  4. Explore the Application using the Calabash Console – Not strictly necessary, this step involves starting up an instance of the Calabash console. The Calabash console is a command line utility that allows us to issue commands to the Calabash test server communicating with our app. We can use this to discover the Calabash queries necessary to interact with the UI object in the application.
  5. Update Step Definitions – Once we have figured out the Calabash queries for locating and manipulating the UI object, we can use these - along with the Calabash API - to implement the step definitions.
  6. Run the Feature – When the step definitions are finished, we can run the features. If this is a brownfield application, the functionality has already been coded and the tests should all pass. Sometimes, the test is written before the feature is implemented in the application; in which case we would the add the feature to the application.
  7. Implement the Feature in the Application – This step isn't required when writing tests for apps that already exist. The developer will shift attention to the application and write the code to implement the desired functionality, and make the test pass.
  8. Repeat – Once the feature is implemented with a passing test, it is done. Time to move on and implement the next bit of functionality in the application.

Steps are written using the business language of the application to specify the expected behavior. They describe the behavior from a business standpoint, rather than a "click here, do this,” standpoint. For example, if you’re creating a credit card validation test, you might write a custom step of: "I enter a credit card number that's [x] digits long". The step definition then knows to do multiple things, namely: first, set the focus to the credit card text field, and then enter a string of numbers of the specified length. This approach offers the following advantages:

  • They are more concise – Custom steps allow you to do more work in fewer steps. Consider the credit card validation example from before. The logic for locating the credit card field is abstracted away in the step definition. If the UI changes, the step definition can be changed, but the behavior remains the same. Another place this becomes obvious is in navigating through complex apps. If a feature to be tested is 10 screens deep, it would require, at a minimum 10 different steps to navigate to it. However, a single custom step such as " given we are on the credit card validation screen ” could be defined to handle all the required navigation in a single step.
  • They allow for more code reuse – Because a lot of the work in custom steps is encapsulated in the step definitions, it allows for code reuse between steps. Take the previous example of navigation, for instance. With a custom step, that navigation is contained in one step that could be reused across many different tests.

It’s not necessary to follow the BDD methodology in order to use Calabash effectively, but having a basic understanding of it goes a long way in making intelligent testing decisions. For a much more detailed discussion about BDD, we recommend Matt Wynne and Aslak Hellesøy’s book, The Cucumber Book: Behaviour Driven Development for Testers and Developers. In addition to examining the philosophy of BDD, it also provides an in-depth examination of writing Gherkin specifications and Cucumber tests.


This guide assumes that you are familiar with the concepts from the Introduction to Xamarin Test Cloud guide.

Running Calabash requires the following:

  • Ruby >= 2.0; 2.3.1 is preferred (not that it is not possible to use the default System Ruby that is installed with OS X)
  • The use of a Ruby gem manager such as Bundler is strongly encouraged.

Calabash tests are written with any text editor. RubyMine is an excellent commerical Ruby IDE with support for Cucumber. Visual Studio Code is a free IDE with plugins that provide some support for Ruby and Cucumber.


Using Calabash on iOS has these additional requirements:

  • An iOS Device or simulator that has ben configured for development.
  • Xcode 8
  • An App Bundle that has the Xamarin Test Cloud Agent embedded.

Behavior Driven Development with Cucumber

In Cucumber, acceptance tests are known as features. Features are text files with the .feature extension that are written in plain language understood by developers and business users alike. A feature is used to test one particular function of an application. Features are broken down into one or more scenarios. A scenario describes a test that describes when some functionality of an application has been properly implemented. It exists within the context of the feature. A scenario is in turn subdivided into steps – a list of sentences that describe the pre-conditions for the scenario, the actions a user would take, and the expected results of the user’s actions. Each step is matched to a step definition - a piece of Ruby code that executes when the test runs.

The following diagram illustrates how all these pieces fit together:

Step definitions use the Calabash Ruby APIs to interact with the application while it is running on a device or in the simulator/emulator. The APIs contain methods to simulate user actions such as touching the screen, entering text into text fields, and more. Calabash also provides a rich query language to locate and interact with objects on the screen.

Let's look at each of these concepts in a bit more detail.


Gherkin isn’t a language, but instead a set of grammar rules that allow you to specify behaviors in any natural language, which can then be parsed and executed using the Cucumber framework.

For a Gherkin reference, see the Gherkin Wiki, which is part of the Cucumber project.


A scenario specifies a single behavior or use case within a given feature that is comprised of various Steps. For example, the following scenario describes the behavior of ensuring that a credit card input field has the correct length of digits:

Scenario: Credit card number is to short
  Given I use the native keyboard to enter "123456" into text field number 1
  And I touch the "Validate" button
  Then I see the text "Credit card number is too short."

Steps usually begin with one of the keywords Given, When, Then, And, or But, however, they don’t have to, they can use * in place of those keywords. In fact, Cucumber does not distinguish between them (or *). They are instead meant to provide a language hint based on cause and effect to the stakeholders as to what is being described.

As such, simply recognizing their language implications are enough to use them effectively. However for a detailed examination on these keywords, see the Cucumber Wiki entry on them. For a philosophical discussion of them, see Robert C. Martin’s blog post on them.

Feature Definitions

Rarely is a feature defined by a single behavior. For this very reason, Scenarios can be grouped together logically under a Feature Definition. Feature definitions are typically given a name and an optional, short description.

For example, the following Feature describes several scenarios or behaviors of credit card validation:

Feature: Credit card validation.
Credit card numbers must be exactly 16 characters.

Scenario: Credit card number is too short
    Given I use the native keyboard to enter "123456" into text field number 1
    And I touch the "Validate" button
    Then I see the text "Credit card number is too short."

Scenario: Credit card number is too long
    Given I try to validate a credit card number that is 17 characters long
    Then I should see the error message "Credit card number is too long."

As aforementioned, because these specifications are executable, they are known as Executable Specifications. Cucumber will take a feature definition, parse it, and execute each scenario as described by the step definitions.

Step Definitions

Step definitions are like code-behind for the scenarios defined in Gherkin. They provide the glue that makes them runnable in the application. Their sole function is to translate the Gherkin into runnable actions.

For example, the following code contains the required step definitions to run the credit card validation tests from above:

Given(/^I try to validate a credit card number that is (\d+) characters long$/) do |number_of_digits|
  touch("textField marked:'CreditCardNumberField'")
  keyboard_enter_text("9" * number_of_digits.to_i)
  touch("button marked:'Validate'")

Then(/^I should see the error message "(.*?)"$/) do |error_message|
  text_view = query("textView marked:'ErrorMessagesField' {text CONTAINS '#{error_message}'}")
  raise "The error message '#{error_message}' is not visible in the View." unless text_view.any?

The step definitions make use of the methods in the Calabash API such as touch and query. These methods take queries that tell Calabash how to locate views on the screen.

Cucumber + Calabash

While Gherkin defines the tests and the step definitions provide the glue code to translate the tests into something actionable, Cucumber is the framework that actually runs the tests.

However, Cucumber is a generic framework for testing; it requires an Automation Library that plugs in and allows Cucumber to execute on a particular platform or technology. This architecture allows Cucumber tests to be written for nearly any platform, as long as there is an Automation Library that provides platform support. The Cucumber technology stack then looks something like this:

Xamarin Workbook

If it's not already installed, install the Xamarin Workbooks app first. The workbook file should download automatically, but if it doesn't, just click to start the workbook download manually.