Handy Groovy for Batch Processing, Systems Integration and Data Migration

Groovy comes in very handy when doing batch processing, database access, file and directory modifications and data migrations. You can easily read data from one host and store it without intermediate storage directly at the destination hosts database system. Groovy accesses flawlessly JEE servers (EJB), web services and other remote services. With it’s polyglot behaviour, most data integration scenarios are addressable.

I frequently use it’s capabilities instead of shell scripting. Many handy file and directory access and manipulation methods and much more are made available from within the scripting language through AntBuilder, a DSL like interface to well known Apache Ant. It is also possible to start external processes, manage these and pipe ones output as the input to another process’s input (Groovy Process Management).

Maintenance scripts written in Groovy, need not be compiled. They are simply stored as text files and may be executed directly from the shell, cron or other scheduling tools like UC4. Technical support personnel and system administrators then may look at the source of the scripts in case of errors or other difficulties (ok – that’s a matter of taste – one could argue that if the manual is detailed enough, there is no need to look at a batch job’s code – unless you are on late shift 🙂 ).

If you have a full Groovy installation located at your production machine, then starting a Groovy script is as easy as this (supposing that the directory {groovy-home}/bin is included in your PATH variable):

1
> groovy my-groovy-script.groovy

If your systems administrator is not fond of installing Groovy, then that’s fine also – just need to deploy somewhere the groovy-all-x.x.x.jar (found in {groovy-home}/embeddable) and reference it in your class path.

I have shell scripting files that set up a decent execution environment that serves our needs in respects to logging and the like, prior to executing the actual Groovy code. A simple line like this then starts the Groovy script:

1
> /opt/jdk/bin/java -classpath $CLASSPATH groovy.ui.GroovyMain $@

We usually include helper classes in the classpath that give the Groovy scripting code access to all resources needed, such as databases and JEE servers. Those helpers are aware of the execution environment. If the script is executed on a integration test server, then only connections and service locators to integration testing databases and application servers are accessible. Which is fully transparent to the programmer.

The following interface gives an outline of an utility, that processes local configuration files from the application directory and then offers JDBC connection creation and JEE service lookups.

1
2
3
4
5
6
7
interface ConfigUtil {
 
    public Connection createTradesDbConnection() throws SQLException
    public Connection createMasterDbConnection() throws SQLException
    public ServiceLocator createServiceLocator() throws Exception
 
}

Now it’s very easy to create for instance an index maintenance script for SAP Sybase ASE database servers, that’s periodically called and checks if the changed rows in a table make 30 % or more. If so, it executes an UPDATE INDEX STATISTICS command against that table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import java.sql.Connection;
 
import de.tmw.batch.DefaultConfigUtil;
import groovy.sql.Sql;
 
configUtil = new DefaultConfigUtil()
sql = new Sql(configUtil.createMasterDbConnection())
 
highWaterMark = 30.0 //we do an update index statistics when 30% of the rows have been changed
 
tables = ["Trade", "Inventory", "Customer", "Instrument"]
 
tables.each() { table ->
    if (sql.firstRow("select object_id(?) as 'id'", table).id != null) {
        datachange = sql.firstRow("select datachange(${table} ,null ,null) as 'change'").change
        if (datachange != null ) {
            println "${table} has datachange = ${datachange} %; threshold is ${highWaterMark}."
            if (datachange >= highWaterMark) {
                println "   Updating indices ..."
                sql.execute("update index statistics ${table}".toString())
            }
        }
    }
    else {
        println "Skipping $table (unknown) ..."
    }
 
}

In reality, this script is much more complex and searches through dozens of partitions via an to our needs tailored algorithm. Thanks to the excellent SQL integration in Groovy, this is really easy and helpful, transparent and fun to work with.

Leave a Reply

Your email address will not be published. Required fields are marked *