Extracting Complex Report Data from Relational Databases with Native Queries and JPA

For reporting and other tasks, it may not be enough to create JPA entites that are mapped to physical tables or views. Complex queries may be constructed on the fly. Large datasets may be split over different tables yet with the same column layout making views a difficult approach to use. Group by clauses may change depending on the users reporting needs. There are quite a few scenarios when dealing with complex reporting and data requirements such as controlling market risk, calculating profit and loss figures or else complex ad-hoc reporting needs. What options do we have when there is need to encapsulate the reporting logic in an Java EE component such as a CDI Bean or an EJB and offering a service interface for remote clients?

  • Executing a native query on an entity manager instance and handling a list of lists of objects?
  • Retrieve the JDBC Connection Object and do some “magic”?
  • Resort to Spring or even MyBatis?

JPA is a very elegant solution when it comes to CRUD (create, read update delete). What is less known: JPA also has something in stock for this more complex querying and reporting need!

Entities without a Table

Usually, you define a JPA entity on a Java POJO with the annotations @Entity and @Table (this could also be defined in a xml mapping file – I’ll use annotations in this post).

An entity may also be simply used as the result row of an arbitrary native query. And yes: native queries are the right tool for the job.

Keep in mind that, as Gavin King (the inventor of Hibernate) stated in a remarkable comment, ‘Systems like Hibernate [and JPA] are designed as “leaky abstractions”‘ (see here). In our case, the “leaky abstraction” is the possibility to use native SQL in JPA and therefore by-passing the ORM layer. It is possible and wanted to use plain SQL in situations where the ORM is not the right tool for the job. What’s nice: a native query even does the mapping to a POJO of the query results for you.

The population of the POJO and mapping of result set columns to POJO properties is done by your JPA provider. Just do not annotate your entity with the @Table tag! In the example below I use a much stripped down entity that represents a “tableless” entity holding P&L data (Profit and Loss) by portfolioId and valDate (valuation date).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
...
@Entity
public class PortfolioAggregation implements Serializable {
 
    private static final long serialVersionUID = 1L;
 
    //This resembles the business key:
    private Date valDate;
    private long portfolioId;
    //KEY - END
 
    private double plNet;
    private double plGross;
 
    public PortfolioAggregation() {
    }
 
    @Id
    @Temporal(TemporalType.DATE)
    public Date getValDate() {
        return valDate;
    }
 
    public void setValDate(Date valDate) {
        this.valDate = valDate;
    }
 
    @Id
    public long getPortfolioId() {
        return portfolioId;
    }
 
    public void setPortfolioId(long portfolioId) {
        this.portfolioId = portfolioId;
    }
 
    public double getPlNet() {
        return plNet;
    }
 
    public void setPlNet(double plNet) {
        this.plNet = plNet;
    }
 
    public double getPlGross() {
        return plGross;
    }
 
    public void setPlGross(double plGross) {
        this.plGross = plGross;
    }
 
    //Class definition truncated from here ... 
 
}

Now, when creating the native query, you simply submit in addition to the SQL, the class type of our entity that is to be populated from each resultset row of the query.

1
2
3
4
String sql = "select a.valDate valDate, a.portfolioId portfolioId, sum(plNet) plNet, sum(plGross) plGross FROM ..."; //SQL truncated!
Query q = entityManager.createNativeQuery(sql, PortfolioAggregation.class);@SuppressWarnings("unchecked")
List<PortfolioAggregation> aggs = q.getResultList();

The SQL is not complete. It’s just to show how it is basically used and actually much more complex. It is important that each column name from the result relates to a property name of the entity. As you may see, there are also sum() aggregate functions in the query. I used so-called alias names for the aggregat functions, that in turn relate to a property name of our PortfolioAggregation class. This information is needed, so that the JPA implementation is able to properly populate your entity instances. In my eyes a nice feature. Callers of the portfolio aggregation service that receive a list of PortfolioAggregation objects may use them freely: in reporting tools such as Jasper Reports, show the data in tables at the client side and even manipulate it (since setters are included – which is not necessary) to show changes in different scenarios or implement other methods that manipulate the data (transiently) according to your client needs. Also you should be aware that this entity is not persistable! It is, from a persistence perspective, read only.

What I do not show in this code snippet is the way how the actual SQL query is built. This sometimes can become messy. Database Views or Named native queries might help, but more often plain string concatenation is used :-(((

Currently I am looking for ways around this dilemma. In a short conversation on Twitter Lukas Eder pointed me to a possible solution: jOOQ. jOOQ delivers a fluent API to construct SQL Queries and results may be mapped to POJOs. Quite interesting and I’ll definitively have a look at it and blog about my experiences with it – stay tuned!

 

NetBeans Smart Semicolon — for German Keyboards

You know it – at least when using the Eclipse Code Editor: type a semicolon while editing a string literal and the semicolon will be printed at the end of the line and your cursor is placed after the semicolon. Unfortunately this happens even then when you intend to add a semicolon to the string literal. A similar function is also included in the NetBeans Java Editor. It is implemented in a more practical way I think: Ctrl+SEMICOLON adds a semiciolon to the end of the line and Ctrl+Shift+SEMICOLON adds a semicolon to the end of the line and additionally creates a new one (in other words: adds a line break after the semicolon).

If you live in Germany, that shortcut does not work: the semicolon is on the same key as the comma and only reachable if you hold the shift key pressed! But that makes an other key combination and the result is not what you want. We need to addapt the NetBeans keymap for the java editor. The keymap is found under “Tools/Options”.

Options Dialog

After clicking the “Keymap” icon on the Options dialog, the key bindings are displayed. The “Search” textbox allows to find the needed entries “Complete Line” and “Complete Line and Create new Line” quickly. In the picture above I simply entered “com” for “complete” and the search already hit the right entry.

In the shortcut column for “Complete Line” hold the Ctrl key and type comma. The key combination “Ctrl+COMMA” is entered in the column. For “Complete Line and Create new Line” hold down the Ctrl and Alt keys and type comma. The key combination “Ctrl+Alt+COMMA” is entered in the respective lines shortcut column. Both key combinations are already reserved for other shortcuts. A warning is displayed. Ignore the warning, click “Apply” and you’re done. Surely: if you want to keep the original key bindings, you have to come up with your own!

This fixes two very welcome editor shortcuts for the german keyboard layout. Try it out!

For even more cool shortcuts and hints have a look here (blog post in german).

Multi-Tenancy with Java EE and JBoss

Multi-tenancy is a frequent requirement in large business applications. Besides the requirement to separate different clients or environments, multi-tenancy may also be andvantageous in terms of distributing your data client wise across multiple databases. Currently frameworks such as hibernate or other API’s are not of much help.

A large business application for market risk evaluation that needed services and a Java EE compatible Front-End is exactly designed in this way: each tenant has its own database. What makes it even more complex: each tenant has not only one database to handle its entire data needs, but one for each product class. Huge amounts of data are produced every night and now need to be analysed, eventually corrected and calculations re-triggered.

I used an entity manager delegate that looks up the needed entity manager for the current tenant/product class combination via JNDI on each call. This is almost transparent to the clients of the delegate, except that, and that should be clear, somehow an identifier for the tenant/product class tuple needs to be supplied.
JBoss JNDI lookup of entity managers works flawless as of Wildfly 8.0 or JBoss EAP 6.1.1. In earlier versions issues existed and the lookup failed: https://issues.jboss.org/browse/WFLY-299.

Here is the code for the multi-tenant entity manager:

@Stateless
@LocalBean
public class MultiTenantEntityManager {

    private static final String JNDI_ENV = "java:app/entitymanager/";

    @Resource
    private SessionContext ctx;

    public MultiTenantEntityManager() {
    }

....

    public Query createQuery(String dataSourceName, String string) {
        return entityManager(dataSourceName).createQuery(string);
    }

    public <T> TypedQuery<T> createQuery(String dataSourceName, CriteriaQuery<T> cq) {
        return entityManager(dataSourceName).createQuery(cq);
    }

    public <T> TypedQuery<T> createQuery(String dataSourceName, String string, Class<T> type) {
        return entityManager(dataSourceName).createQuery(string, type);
    }

    public void detach(String dataSourceName, Object o) {
        entityManager(dataSourceName).detach(o);
    }

    public <T> T find(String dataSourceName, Class<T> type, Object o) {
        return entityManager(dataSourceName).find(type, o);
    }

    public <T> T find(String dataSourceName, Class<T> type, Object o, Map<String, Object> map) {
        return entityManager(dataSourceName).find(type, o, map);
    }

    public <T> T find(String dataSourceName, Class<T> type, Object o, LockModeType lmt) {
        return entityManager(dataSourceName).find(type, o, lmt);
    }

    public <T> T find(String dataSourceName, Class<T> type, Object o, LockModeType lmt, Map<String, Object> map) {
        return entityManager(dataSourceName).find(type, o, lmt, map);
    }

....

    private EntityManager entityManager(String dataSourceName) {

        final EntityManager entityManager = (EntityManager) ctx.lookup(JNDI_ENV + dataSourceName);

        if (entityManager == null) {
            throw new RuntimeException("Unknown data source name '" + dataSourceName + "'.");
        }

        return entityManager;

    }

}

I left out most of the overriden delegate methods in order to clearly outline the idea. Each method call to the delegate has an additional parameter: The name of the datasource. The name is a string that follows a simple schema: “tenant_productClass” and thus identifies the database we need to access. You are free to come up with your own naming scheme that suits your needs best.
I did tests with configurations that consisted of up to 700 datasource pools. JBoss EAP managed them without any difficulty (ok, at one point during testing the database servers gave up – but that was an easy fix ). The datasource’s name is also part of the JNDI lookup string for the real JPA entity manager. The delegate looks up first the entity manager in the JNDI tree and then calls the actual entity managers method with the supplied parameters.

Here is an example persistence.xml for a persistence unit (as there are many of them, I generate them by script):

  <persistence-unit name="tenant1_eq" transaction-type="JTA">
    <jta-data-source>java:/jboss/datasources/itpdb_pdbeq</jta-data-source>
    <exclude-unlisted-classes>false</exclude-unlisted-classes>
    <properties>
      <property name="hibernate.show_sql" value="true"/>
      <property name="hibernate.format_sql" value="true"/>
      <property name="hibernate.dialect" value="org.hibernate.dialect.SybaseASE15Dialect"/>
      <property name="hibernate.cache.use_second_level_cache" value="false"/>
      <property name="hibernate.jdbc.batch_size" value="50"/>
      <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
      <property name="hibernate.order_inserts" value="true"/>
      <property name="hibernate.order_updates" value="true"/>
      <property name="hibernate.generate_statistics" value="true"/>
      <property name="jboss.entity.manager.jndi.name" value="java:app/entitymanager/tenant1_eq"/>
      <property name="jboss.entity.manager.factory.jndi.name" value="java:app/entitymanagerfactory/tenant1_eq"/>
    </properties>
  </persistence-unit>

The multi-tenant entity manager is injected into each bean that would normally have an entity manager injected. For each call, the datasource’s name, to which the entity manager references, has to be known as it is also part of the entity manager’s JNDI name. Below is a snippet of a boundary, that uses the multi-tenant entity manager.

@Stateless
@Remote(AggregationConfig.class)
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public class AggregationConfigBean implements AggregationConfig {

    private static final Logger LOG = LoggerFactory.getLogger(AggregationConfigBean.class.getSimpleName());

    @EJB
    MultiTenantEntityManager entityManager;

    /**
     * Default constructor.
     */
    public AggregationConfigBean() {
    }

    @Override
    public List<AggConfigMeta> findConfigMeta(String dataSourceName) {

        List<AggConfigMeta> res = Lists.newArrayList();

        Query q = entityManager.createNativeQuery(dataSourceName, "SELECT meta FROM ConfigMeta WHERE type='config'");
        List<String> result = q.getResultList();

        LOG.info("Found " + result.size() + " meta entries.");

        ObjectMapper mapper = new ObjectMapper();

        LOG.info("Instantiated ObjectMapper...");

        for (String meta : result) {
            try {
                res.add(mapper.readValue(meta, AggConfigMeta.class));
            } catch (IOException ex) {
                LOG.error("Failed reading AggConfig meta data.", ex);
            }
        }

        return res;

    }

.....

}

The usage is actually straight forward. This approach may also be used to handle database-sharding in a similiar way, see database-sharding.
It opens a window to a whole lot of new ideas on how to handle large amounts of data with Java EE.

NetBeans and Maven Profiles

Quite often it is needed to build an artifact for different environments. For instance: I need to deploy an ejb-jar file that contains a persistence.xml file, that links it’s persistence units to datasources via JNDI entries. The data sources are in each environment (development, acceptance test and production) different and configured in my case within a RedHat JBoss Application Server. This is usually done by the application management or server administrators. All that is needed to know by the development team is the JNDI name the persistence unit refers to.

So far there would be no need for different persistence.xml files. Unfortunately in the development and acceptance test environments additional databases are available for complex testing scenarios. Also multi tenancy is implemented in a way, that each tenant has it’s own database. This alone leads to different numbers of databases in each environment and hence to different numbers of persistence units and an environment dependend persistence.xml. To meet this requirement, I build one ejb-jar file for each environment with a specific persistence.xml file in the META-INF directory.

The project directory layout is:

  • ROOT
    • env
      • dev
      • int
      • prod
    • src
      • main
        • java
        • resources
      • test
    • target

Underneath the env directory are three subdirectories, one for each environment. They contain the relevant persistence.xml files.

Here is where the maven profiles come in: for each environment I set up an own profile, that joins in the respective environment directory as a resource directory.

    <profiles>
        <profile>
            <id>dev</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <properties>
                <build.profile.id>dev</build.profile.id>
            </properties>
            <build>
                <resources>
                    <resource>
                        <directory>
                            env/${build.profile.id}
                        </directory>
                    </resource>
                    <resource>
                        <directory>
                            src/main/resources
                        </directory>
                    </resource>
                </resources>
            </build>
        </profile>
        <profile>
            <id>int</id>
            <activation>
                <activeByDefault>false</activeByDefault>
            </activation>
            <properties>
                <build.profile.id>int</build.profile.id>
            </properties>
            <build>
                <resources>
                    <resource>
                        <directory>
                            env/${build.profile.id}
                        </directory>
                    </resource>
                    <resource>
                        <directory>
                            src/main/resources
                        </directory>
                    </resource>
                </resources>
            </build>
        </profile>
        <profile>
            <id>prod</id>
            <activation>
                <activeByDefault>false</activeByDefault>
            </activation>
            <properties>
                <build.profile.id>prod</build.profile.id>
            </properties>
            <build>
                <resources>
                    <resource>
                        <directory>
                            env/${build.profile.id}
                        </directory>
                    </resource>
                    <resource>
                        <directory>
                            src/main/resources
                        </directory>
                    </resource>
                </resources>
            </build>
        </profile>
    </profiles>

Since NetBeans uses maven to organize it’s project structure, it works very nicely with maven profiles. Via a right mouse click at the project root in the “Projects” tab and selecting “Set Configuration” you are able to choose the current maven profile:

setconfiguration1

The really cool thing is: depending on the profile you chose, NetBeans shows under “Other Sources” the profile related resource directory. In this case I selected the “dev” environment and NetBeans links to the folder “env/dev”:

dev1

After selecting the “int” environment (which is in my organization also used as acceptance test), NetBeans links to the “env/int” folder:

int

Pretty cool! This way you can easily switch between Maven profiles, edit the profile related resources and build your artifact.

Left Over Ghost Breakpoints

If you are using the same library jar files with source attachements in different NetBeans projects, it could happen to you that breakpoints you set in one project magically appear in an other project, too. Unfortunately they are not deletable there, only in the project where they were set originally.

Often I have to switch between many projects back and forth. Then getting rid of these “ghost” breakpoints proved difficult. Here is what to do:

On windows, NetBeans stores workspace configuration data in your windows home directory under AppData/Roaming/NetBeans/version-number. In subdirectory config/Services is the file “org-netbeans-modules-debugger-Settings.properties” located. Breakpoints are stored in this file. After deleting them there, you definitely get rid of them!