Extracting Complex Report Data from Relational Databases with Native Queries and JPA

For reporting and other tasks, it may not be enough to create JPA entites that are mapped to physical tables or views. Complex queries may be constructed on the fly. Large datasets may be split over different tables yet with the same column layout making views a difficult approach to use. Group by clauses may change depending on the users reporting needs. There are quite a few scenarios when dealing with complex reporting and data requirements such as controlling market risk, calculating profit and loss figures or else complex ad-hoc reporting needs. What options do we have when there is need to encapsulate the reporting logic in an Java EE component such as a CDI Bean or an EJB and offering a service interface for remote clients?

  • Executing a native query on an entity manager instance and handling a list of lists of objects?
  • Retrieve the JDBC Connection Object and do some “magic”?
  • Resort to Spring or even MyBatis?

JPA is a very elegant solution when it comes to CRUD (create, read update delete). What is less known: JPA also has something in stock for this more complex querying and reporting need!

Entities without a Table

Usually, you define a JPA entity on a Java POJO with the annotations @Entity and @Table (this could also be defined in a xml mapping file – I’ll use annotations in this post).

An entity may also be simply used as the result row of an arbitrary native query. And yes: native queries are the right tool for the job.

Keep in mind that, as Gavin King (the inventor of Hibernate) stated in a remarkable comment, ‘Systems like Hibernate [and JPA] are designed as “leaky abstractions”‘ (see here). In our case, the “leaky abstraction” is the possibility to use native SQL in JPA and therefore by-passing the ORM layer. It is possible and wanted to use plain SQL in situations where the ORM is not the right tool for the job. What’s nice: a native query even does the mapping to a POJO of the query results for you.

The population of the POJO and mapping of result set columns to POJO properties is done by your JPA provider. Just do not annotate your entity with the @Table tag! In the example below I use a much stripped down entity that represents a “tableless” entity holding P&L data (Profit and Loss) by portfolioId and valDate (valuation date).

public class PortfolioAggregation implements Serializable {
    private static final long serialVersionUID = 1L;
    //This resembles the business key:
    private Date valDate;
    private long portfolioId;
    //KEY - END
    private double plNet;
    private double plGross;
    public PortfolioAggregation() {
    public Date getValDate() {
        return valDate;
    public void setValDate(Date valDate) {
        this.valDate = valDate;
    public long getPortfolioId() {
        return portfolioId;
    public void setPortfolioId(long portfolioId) {
        this.portfolioId = portfolioId;
    public double getPlNet() {
        return plNet;
    public void setPlNet(double plNet) {
        this.plNet = plNet;
    public double getPlGross() {
        return plGross;
    public void setPlGross(double plGross) {
        this.plGross = plGross;
    //Class definition truncated from here ... 

Now, when creating the native query, you simply submit in addition to the SQL, the class type of our entity that is to be populated from each resultset row of the query.

String sql = "select a.valDate valDate, a.portfolioId portfolioId, sum(plNet) plNet, sum(plGross) plGross FROM ..."; //SQL truncated!
Query q = entityManager.createNativeQuery(sql, PortfolioAggregation.class);@SuppressWarnings("unchecked")
List<PortfolioAggregation> aggs = q.getResultList();

The SQL is not complete. It’s just to show how it is basically used and actually much more complex. It is important that each column name from the result relates to a property name of the entity. As you may see, there are also sum() aggregate functions in the query. I used so-called alias names for the aggregat functions, that in turn relate to a property name of our PortfolioAggregation class. This information is needed, so that the JPA implementation is able to properly populate your entity instances. In my eyes a nice feature. Callers of the portfolio aggregation service that receive a list of PortfolioAggregation objects may use them freely: in reporting tools such as Jasper Reports, show the data in tables at the client side and even manipulate it (since setters are included – which is not necessary) to show changes in different scenarios or implement other methods that manipulate the data (transiently) according to your client needs. Also you should be aware that this entity is not persistable! It is, from a persistence perspective, read only.

What I do not show in this code snippet is the way how the actual SQL query is built. This sometimes can become messy. Database Views or Named native queries might help, but more often plain string concatenation is used :-(((

Currently I am looking for ways around this dilemma. In a short conversation on Twitter Lukas Eder pointed me to a possible solution: jOOQ. jOOQ delivers a fluent API to construct SQL Queries and results may be mapped to POJOs. Quite interesting and I’ll definitively have a look at it and blog about my experiences with it – stay tuned!


Multi-Tenancy with Java EE and JBoss

Multi-tenancy is a frequent requirement in large business applications. Besides the requirement to separate different clients or environments, multi-tenancy may also be andvantageous in terms of distributing your data client wise across multiple databases. Currently frameworks such as hibernate or other API’s are not of much help.

A large business application for market risk evaluation that needed services and a Java EE compatible Front-End is exactly designed in this way: each tenant has its own database. What makes it even more complex: each tenant has not only one database to handle its entire data needs, but one for each product class. Huge amounts of data are produced every night and now need to be analysed, eventually corrected and calculations re-triggered.

I used an entity manager delegate that looks up the needed entity manager for the current tenant/product class combination via JNDI on each call. This is almost transparent to the clients of the delegate, except that, and that should be clear, somehow an identifier for the tenant/product class tuple needs to be supplied.
JBoss JNDI lookup of entity managers works flawless as of Wildfly 8.0 or JBoss EAP 6.1.1. In earlier versions issues existed and the lookup failed: https://issues.jboss.org/browse/WFLY-299.

Here is the code for the multi-tenant entity manager:

public class MultiTenantEntityManager {

    private static final String JNDI_ENV = "java:app/entitymanager/";

    private SessionContext ctx;

    public MultiTenantEntityManager() {


    public Query createQuery(String dataSourceName, String string) {
        return entityManager(dataSourceName).createQuery(string);

    public <T> TypedQuery<T> createQuery(String dataSourceName, CriteriaQuery<T> cq) {
        return entityManager(dataSourceName).createQuery(cq);

    public <T> TypedQuery<T> createQuery(String dataSourceName, String string, Class<T> type) {
        return entityManager(dataSourceName).createQuery(string, type);

    public void detach(String dataSourceName, Object o) {

    public <T> T find(String dataSourceName, Class<T> type, Object o) {
        return entityManager(dataSourceName).find(type, o);

    public <T> T find(String dataSourceName, Class<T> type, Object o, Map<String, Object> map) {
        return entityManager(dataSourceName).find(type, o, map);

    public <T> T find(String dataSourceName, Class<T> type, Object o, LockModeType lmt) {
        return entityManager(dataSourceName).find(type, o, lmt);

    public <T> T find(String dataSourceName, Class<T> type, Object o, LockModeType lmt, Map<String, Object> map) {
        return entityManager(dataSourceName).find(type, o, lmt, map);


    private EntityManager entityManager(String dataSourceName) {

        final EntityManager entityManager = (EntityManager) ctx.lookup(JNDI_ENV + dataSourceName);

        if (entityManager == null) {
            throw new RuntimeException("Unknown data source name '" + dataSourceName + "'.");

        return entityManager;



I left out most of the overriden delegate methods in order to clearly outline the idea. Each method call to the delegate has an additional parameter: The name of the datasource. The name is a string that follows a simple schema: “tenant_productClass” and thus identifies the database we need to access. You are free to come up with your own naming scheme that suits your needs best.
I did tests with configurations that consisted of up to 700 datasource pools. JBoss EAP managed them without any difficulty (ok, at one point during testing the database servers gave up – but that was an easy fix ). The datasource’s name is also part of the JNDI lookup string for the real JPA entity manager. The delegate looks up first the entity manager in the JNDI tree and then calls the actual entity managers method with the supplied parameters.

Here is an example persistence.xml for a persistence unit (as there are many of them, I generate them by script):

  <persistence-unit name="tenant1_eq" transaction-type="JTA">
      <property name="hibernate.show_sql" value="true"/>
      <property name="hibernate.format_sql" value="true"/>
      <property name="hibernate.dialect" value="org.hibernate.dialect.SybaseASE15Dialect"/>
      <property name="hibernate.cache.use_second_level_cache" value="false"/>
      <property name="hibernate.jdbc.batch_size" value="50"/>
      <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
      <property name="hibernate.order_inserts" value="true"/>
      <property name="hibernate.order_updates" value="true"/>
      <property name="hibernate.generate_statistics" value="true"/>
      <property name="jboss.entity.manager.jndi.name" value="java:app/entitymanager/tenant1_eq"/>
      <property name="jboss.entity.manager.factory.jndi.name" value="java:app/entitymanagerfactory/tenant1_eq"/>

The multi-tenant entity manager is injected into each bean that would normally have an entity manager injected. For each call, the datasource’s name, to which the entity manager references, has to be known as it is also part of the entity manager’s JNDI name. Below is a snippet of a boundary, that uses the multi-tenant entity manager.

public class AggregationConfigBean implements AggregationConfig {

    private static final Logger LOG = LoggerFactory.getLogger(AggregationConfigBean.class.getSimpleName());

    MultiTenantEntityManager entityManager;

     * Default constructor.
    public AggregationConfigBean() {

    public List<AggConfigMeta> findConfigMeta(String dataSourceName) {

        List<AggConfigMeta> res = Lists.newArrayList();

        Query q = entityManager.createNativeQuery(dataSourceName, "SELECT meta FROM ConfigMeta WHERE type='config'");
        List<String> result = q.getResultList();

        LOG.info("Found " + result.size() + " meta entries.");

        ObjectMapper mapper = new ObjectMapper();

        LOG.info("Instantiated ObjectMapper...");

        for (String meta : result) {
            try {
                res.add(mapper.readValue(meta, AggConfigMeta.class));
            } catch (IOException ex) {
                LOG.error("Failed reading AggConfig meta data.", ex);

        return res;




The usage is actually straight forward. This approach may also be used to handle database-sharding in a similiar way, see database-sharding.
It opens a window to a whole lot of new ideas on how to handle large amounts of data with Java EE.