In memory data grid with Hazelcast

In this article, we are going to run Hazelcast in memory data grid for the Spring Boot application. I have already described how to use Hibernate second-level cache with Hazelcast in the article JPA caching with Hazelcast, Hibernate and Spring Boot. The big disadvantage of that solution was the ability to cache entities only by a primary key. On the other hand, you can enable JPA queries caching by other indices. However, it does not solve the problem completely. Such a query can’t use cached entities even if they are matching the criteria. We will solve that problem by using Hazelcast distributed queries.

hz1

Spring Boot has a built-in auto configuration for Hazelcast. It is enabled if the library available under application classpath and @Bean Config is declared.

@Bean
Config config() {
   Config c = new Config();
   c.setInstanceName("cache-1");
   c.getGroupConfig().setName("dev").setPassword("dev-pass");
   ManagementCenterConfig mcc = new ManagementCenterConfig().setUrl("https://192.168.99.100:38080/mancenter").setEnabled(true);
   c.setManagementCenterConfig(mcc);
   SerializerConfig sc = new SerializerConfig().setTypeClass(Employee.class).setClass(EmployeeSerializer.class);
   c.getSerializationConfig().addSerializerConfig(sc);
   return c;
}

In the code fragment above we declared cluster name and password credentials, connection parameters to Hazelcast Management Center and entity serialization configuration. Entity is pretty simple – it has @Id and two fields for searching personId and company.

@Entity
public class Employee implements Serializable {

   private static final long serialVersionUID = 3214253910554454648L;

   @Id
   @GeneratedValue
   private Integer id;
   private Integer personId;
   private String company;

   public Integer getId() {
      return id;
   }

   public void setId(Integer id) {
      this.id = id;
   }

   public Integer getPersonId() {
      return personId;
   }

   public void setPersonId(Integer personId) {
      this.personId = personId;
   }

   public String getCompany() {
      return company;
   }

   public void setCompany(String company) {
      this.company = company;
   }

}

Every entity needs to have a serializer declared if it is to be inserted and selected from the cache. There are same default serializers available inside the Hazelcast library, but I implemented the custom one for our sample. It is based on StreamSerializer and ObjectDataInput.

public class EmployeeSerializer implements StreamSerializer<Employee> {

   @Override
   public int getTypeId() {
      return 1;
   }

   @Override
   public void write(ObjectDataOutput out, Employee employee) throws IOException {
      out.writeInt(employee.getId());
      out.writeInt(employee.getPersonId());
      out.writeUTF(employee.getCompany());
   }

   @Override
   public Employee read(ObjectDataInput in) throws IOException {
      Employee e = new Employee();
      e.setId(in.readInt());
      e.setPersonId(in.readInt());
      e.setCompany(in.readUTF());
      return e;
   }

   @Override
      public void destroy() {
   }

}

There is also a DAO interface for interacting with the database. It has two searching methods and extends Spring Data CrudRepository.


public interface EmployeeRepository extends CrudRepository<Employee, Integer> {

   public Employee findByPersonId(Integer personId);
   public List<Employee> findByCompany(String company);

}

Hazelcast instance is embedded in the application. When starting the Spring Boot application we have to provide VM argument -DPORT which is used for exposing service REST API. Hazelcast automatically detects other running member instances and its port will be incremented out of the box. Here’s REST @Controller class with exposed API.

@RestController
public class EmployeeController {

   private Logger logger = Logger.getLogger(EmployeeController.class.getName());

   @Autowired
   EmployeeService service;

   @GetMapping("/employees/person/{id}")
   public Employee findByPersonId(@PathVariable("id") Integer personId) {
      logger.info(String.format("findByPersonId(%d)", personId));
      return service.findByPersonId(personId);
   }

   @GetMapping("/employees/company/{company}")
   public List<Employee> findByCompany(@PathVariable("company") String company) {
      logger.info(String.format("findByCompany(%s)", company));
      return service.findByCompany(company);
   }

   @GetMapping("/employees/{id}")
   public Employee findById(@PathVariable("id") Integer id) {
      logger.info(String.format("findById(%d)", id));
      return service.findById(id);
   }

   @PostMapping("/employees")
   public Employee add(@RequestBody Employee emp) {
      logger.info(String.format("add(%s)", emp));
      return service.add(emp);
   }

}

@Service is injected into the EmployeeController. Inside EmployeeService there is a simple implementation of switching between Hazelcast cache instance and Spring Data DAO @Repository. In every find method, we are trying to find data in the cache and in case it’s not there we are searching it in database and then putting found entity into the cache.

@Service
public class EmployeeService {

   private Logger logger = Logger.getLogger(EmployeeService.class.getName());

   @Autowired
   EmployeeRepository repository;
   @Autowired
   HazelcastInstance instance;

   IMap<Integer, Employee> map;

   @PostConstruct
   public void init() {
      map = instance.getMap("employee");
      map.addIndex("company", true);
      logger.info("Employees cache: " + map.size());
   }

   @SuppressWarnings("rawtypes")
   public Employee findByPersonId(Integer personId) {
      Predicate predicate = Predicates.equal("personId", personId);
      logger.info("Employee cache find");
      Collection<Employee> ps = map.values(predicate);
      logger.info("Employee cached: " + ps);
      Optional<Employee> e = ps.stream().findFirst();
      if (e.isPresent())
      return e.get();
      logger.info("Employee cache find");
      Employee emp = repository.findByPersonId(personId);
      logger.info("Employee: " + emp);
      map.put(emp.getId(), emp);
      return emp;
   }

   @SuppressWarnings("rawtypes")
   public List<Employee> findByCompany(String company) {
      Predicate predicate = Predicates.equal("company", company);
      logger.info("Employees cache find");
      Collection<Employee> ps = map.values(predicate);
      logger.info("Employees cache size: " + ps.size());
      if (ps.size() > 0) {
         return ps.stream().collect(Collectors.toList());
      }
      logger.info("Employees find");
      List<Employee> e = repository.findByCompany(company);
      logger.info("Employees size: " + e.size());
      e.parallelStream().forEach(it -> {
         map.putIfAbsent(it.getId(), it);
      });
      return e;
   }

   public Employee findById(Integer id) {
      Employee e = map.get(id);
      if (e != null)
         return e;
      e = repository.findOne(id);
      map.put(id, e);
      return e;
   }

   public Employee add(Employee e) {
      e = repository.save(e);
      map.put(e.getId(), e);
      return e;
   }

}

If you are interested in running sample application you can clone my repository on GitHub. In person-service module there is an example for my previous article about Hibernate 2nd cache with Hazelcast, in employee-module there is an example for that article.

Testing

Let’s start three instances of employee service on different ports using VM argument -DPORT. In the first figure visible in the beginning of article these ports are 2222, 3333 and 4444. When starting the last third service’s instance you should see the fragment visible below in the application logs. It means that the Hazelcast cluster of three members has been set up.

2017-05-09 23:01:48.127  INFO 16432 --- [ration.thread-0] c.h.internal.cluster.ClusterService      : [192.168.1.101]:5703 [dev] [3.7.7]

Members [3] {
   Member [192.168.1.101]:5701 - 7a8dbf3d-a488-4813-a312-569f0b9dc2ca
   Member [192.168.1.101]:5702 - 494fd1ac-341b-451c-b585-1ad58a280fac
   Member [192.168.1.101]:5703 - 9750bd3c-9cf8-48b8-a01f-b14c915937c3 this
}

Here is a picture from the Hazelcast Management Center for two running members (only two members are available in the freeware version of Hazelcast Management Center).

Then run docker containers with MySQL and Hazelcast Management Center.

$ docker run -d --name mysql -p 33306:3306 mysql
$ docker run -d --name hazelcast-mgmt -p 38080:8080 hazelcast/management-center:latest

Now, you could try to call endpoint http://localhost:/employees/company/{company} on all of your services. You should see that data is cached in the cluster and even if you call endpoint on different service it find entities put into the cache by different service. After several attempts, my service instances put about 100k entities into the cache. The distribution between the two Hazelcast members is 50% to 50%.

hz-2

Final Words

Probably we could implement smarter solution for the problem described in that article, but I just wanted to show you the idea. I tried to use Spring Data Hazelcast for that, but I’ve got a problem to run it on Spring Boot application. It has HazelcastRepository interface, which something similar to Spring Data CrudRepository but basing on cached entities in Hazelcast grid and also uses Spring Data KeyValue module. The project is not well document and like I said before it didn’t worked with Spring Boot so I decided to implement my simple solution 🙂

In my local environment, queries on the cache were about 10 times faster than similar queries on database. I inserted 2M records into the employee table. Hazelcast data grid could not only be a 2nd level cache but even a middleware between your application and database. If your priority is a performance of queries on the large amounts of data and you need to have a lot of RAM reserved for your Hazelcast in memory data grid. It is the right solution for you 🙂

Piotr's TechBlog

In memory data grid with Hazelcast