August 9, 2015

Groovy script structure done right



Recently, I was tasked with writing a relative complex data migration script. The script involves connecting to a MySQL database, querying existing data and then inserting to a destination schema. Doing this in Bash would be quite hard to test and error prone. Some modern functional language would provide a better solution, e.g. Ruby, Scala, or Groovy. We opt to use Groovy as some of team members have Java background so there is less friction when doing maintenance. This blog post is to show you how to set up basic structure of Groovy scripting with Spock for Unit Testing and Gradle for building.

Groovy CLI

Firstly, we set up a basic script structure with Groovy CLI. Script: data-fix.groovy
#!/usr/bin/env groovy

def cli = new CliBuilder(usage:'data-fix')
cli.with {
    u longOpt: 'user', args: 1, argName: 'user', required: true, 'DB user'
    p longOpt: 'password', args: 1, argName: 'password', required: true, 'DB password'
    s longOpt: 'sourceSchema', args: 1, argName: 'sourceDbSchema', required: true, 'staging DB schema'
    d longOpt: 'destinationSchema', args: 1, argName: 'destDbSchema', required: true, 'production DB schema'
    h longOpt: 'host', args: 1, argName: 'dbHost', 'DB host, default to be localhost'
}

def opts = cli.parse(args)
if (!opts) {
    System.exit(1)
}

new Processor(opts).run()

Basic Processor class:
class Processor {
    def opts

    Processor(opts) {
        this.opts = opts
    }

    void run() {
        println "Running..."
    }
}

The above code can be viewed in this Github commit. Next up, we will set up Unit Testing.

Unit Testing with Spock and Gradle

Spock provides a nice testing framework. I am a fan of its easy mocking syntax and BDD (Behavioural Driven Development) syntax "given, when, then". One way to setup Spock in Groovy is using Gradle build and dependencies management.

By default, Gradle assumes certain directory structures: src/main/groovy, and src/test/groovy. (You can change the above structure as described here). We will move our code into the above directory structure, and will create an empty test file ProcessorSpec.groovy under src/test/groovy directory.
.
├── README.md
└── src
    ├── main
    │   └── groovy
    │       ├── data-fix.groovy
    │       └── Processor.groovy
    └── test
        └── groovy
            └── ProcessorSpec.groovy
Setting up build.gradle in the top directory:
apply plugin: "groovy"

version = "1.0"
description = "Spock Framework - Data fix Project"

// Spock works with Java 1.5 and above
//sourceCompatibility = 1.5

repositories {
  // Spock releases are available from Maven Central
  mavenCentral()
  // Spock snapshots are available from the Sonatype OSS snapshot repository
  maven { url "http://oss.sonatype.org/content/repositories/snapshots/" }
}

dependencies {
  // mandatory dependencies for using Spock
  compile "org.codehaus.groovy:groovy-all:2.4.1"
  testCompile "org.spockframework:spock-core:1.0-groovy-2.4"
  testCompile "cglib:cglib:2.2"
  testCompile "org.objenesis:objenesis:1.2"
}
Let's modify the file ProcessorSpec.groovy to have a failed test, so that we can confirm that test is actually run and everything is setup correctly.
import spock.lang.*

class ProcessSpec extends Specification {
    def "#first test"() {
        when:
        def a = true

        then:
        a == false
    }
}
Executing Gradle build to see the test failed:
$ gradle --info clean test
...
Gradle Test Executor 2 finished executing tests.

ProcessSpec > #first test FAILED
    Condition not satisfied:

    a == false
    | |
    | false
    true
        at ProcessSpec.#first test(ProcessorSpec.groovy:9)

1 test completed, 1 failed
The above changes can be viewed in this Github commit.
Gradle wrapper is great to ensure build is run the same way across different machine. On a machine that does not have Gradle installed, it will first download Gradle and execute the build task. We can setup Gradle wrapper with this easy command:
$ gradle wrapper
# The above command will generate wrapper script and we can execute our build with this command:
$ ./gradlew --info clean test

Adding libraries

We got the basic skeleton done. The next step is to add logic into our script. The script will connect to MySQL database, so we will add mysql-connector to the script. In addition, to debug script, I'm a fan of adding logging statements to the flow. We will use @Grab to add dependencies into the script data-fix.groovy.
file: data-fix.groovy
#!/usr/bin/env groovy

@GrabConfig(systemClassLoader=true)
@Grab('mysql:mysql-connector-java:5.1.27')
@Grab('log4j:log4j:1.2.17')
...

file: Processor.groovy
import groovy.sql.Sql
import org.apache.log4j.*
import groovy.util.logging.*

@Log4j
class Processor {
    def opts

    Processor(opts) {
        log.level = Level.DEBUG
        this.opts = opts
    }

    void run() {
        log.info "Running..."
    }
}
Running the script gives the expected log statement. However, running build now failed with this exception: Execution failed for task ':compileGroovy'.> org/apache/ivy/core/report/ResolveReport
[src/main/groovy] $ ./data-fix.groovy -h localhost -u root -p somepassword -s staging -d prod
INFO - Running...
[  top level dir] $ ./gradlew --info clean test
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':compileGroovy'.
> org/apache/ivy/core/report/ResolveReport
So what went wrong? @Grab is using Grape to manage dependencies, while Gradle has its own dependencies management. At this point, we have 2 options: use Gradle to manage all dependencies and execute script via Gradle, or mix and match between Gradle and Grape (Grape is for runtime, Gradle is only for testing). Both options have its own merits. For me, I prefer the simplicity of Grape at runtime, so I will continue with the later. 
We will need to configure build.gradle to ignore Grape:
test {                                        
  systemProperty 'groovy.grape.enable', 'false'  
}

compileGroovy {
  groovyOptions.forkOptions.jvmArgs = [ '-Dgroovy.grape.enable=false' ]
}
compileTestGroovy {
  groovyOptions.forkOptions.jvmArgs = [ '-Dgroovy.grape.enable=false' ]
}
The above change can be viewed in this Github commit.

Using this method will violate DRY (Don't Repeat Yourself), as dependencies are defined in 2 places: @Grab and in Gradle dependencies. You can have a look at mrhaki blog post if you want to invoke Groovy script from Gradle task. I found passing script command line options as Gradle run properties is a bit awkward. 

Adding more logic and tests

Simple logic - default localhost if host is not provided

Now that we have a structure going, we can add more logic into our script. The first easy one is set host to the parameter provided, otherwise default to 'localhost'.
file: ProcessorSpec.groovy
    def "#new set host to parameter, or default to localhost"() {
        expect:
        new Processor([]).host == 'localhost'
        new Processor([h: 'myserver']).host == 'myserver'
    }

file: Processor.groovy
    Processor(opts) {
        log.level = Level.DEBUG
        this.opts = opts
        this.host = opts.h ?: 'localhost'
    }

    void run() {
        log.info "Host               : $host"
        log.info "User               : ${opts.u}"
        log.info "Password           : ${opts.p}"
        log.info "Source schema      : ${opts.s}"
        log.info "Destination schema : ${opts.d}"
    }
Running test:
[  top level dir] $ ./gradlew --info clean test
BUILD SUCCESSFUL
[src/main/groovy] $ ./data-fix.groovy -h myserver -u root -p somepassword -s staging -d prod
INFO - Host               : myserver
INFO - User               : root
INFO - Password           : somepassword
INFO - Source schema      : staging
INFO - Destination schema : prod
[src/main/groovy]$ ./data-fix.groovy -u root -p somepassword -s staging -d prod
INFO - Host               : localhost
INFO - User               : root
INFO - Password           : somepassword
INFO - Source schema      : staging
INFO - Destination schema : prod
The above changes can be viewed in this Github commit.

Summary

As you can see, Groovy language is very easy to work with and powerful as a scripting language. Together with unit testing, you have confidence in your script doing the right thing and production ready. I truly believe you should Unit Test everything, including scripts; and the above is the setup to achieve just that.

References