You've successfully subscribed to WorldRemit Technology Blog
Great! Next, complete checkout for full access to WorldRemit Technology Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.
Embracing a Dynamic AI Microservice Architecture to Learn Behavioural Customer Profiles in Real Time

Embracing a Dynamic AI Microservice Architecture to Learn Behavioural Customer Profiles in Real Time

. 4 min read

| Image by Markus Spiske via Unsplash Copyright-free

With the objective of extracting behavioural representations or profiles of our customers in real time, we are faced with the challenge of designing an AI Architecture that can respond reliably to rapid changes in demand for streaming data. This challenge is tackled with the use of a dynamic architecture comprising locally elastic AI Microservices. By dynamically matching resource supply and demand, we aim to reach the throughput and latency requirements of the target real time system. This "real time serverless" architecture facilitates the implementation of Distributed Deep Neural Systems to artificially "learn" the desired embeddings in real time.  

Embracing DAIMA

The exploited Architecture of Dynamic AI Microservices or DAIMA has the following key elements and properties:

  1. Micro-services: Small, autonomous, self-contained.
  2. Massive Parallelisation: Computational elements and functions provided with high levels of parallelisation.
  3. Elasticity: Autonomous provisioning and de-provisioning of computational resources matching supply and demand.
  4. Event-driven: Operation driven by events that can be defined as time-stamped and immutable records of changes in transactional state.
  5. Resilience: The system recovers quickly if some of its components fail.

The DAIMA is supported on a Kubernetes substrate providing coarse microservice capabilities, while fine-grained control and elasticity are achieved relying on the Ray framework [1]. Additionally, high levels of service autonomy and localised elasticity can be reached by adopting an event driven approach [2] where:

  1. Communication between services is performed asynchronously through an event bus and
  2. The synchronisation of tasks is achieved as in the form of a choreography [3] .

Components of DAIMA (powered by Ray)

We are also embracing the idea of “streaming endpoints”  to provide real-time AI-powered predictions from data with high levels of throughput [4].

As highlighted before, a critical objective is the design of a highly elastic DAIMA with the ability to rapidly adapt to load changes. This is particularly crucial in the development of real time systems that require maintaining high levels of throughput and low latency under changing load conditions. Additionally, as explained in [5], speed and precision are “qualifying traits” of elasticity and the synchronisation of resource provisioning and de-provisioning with demand variation leads to the target frugality.  

An approach similar to the presented here is described in [7]. There, the Ray framework was used in combination with Apache Kafka to implement a highly efficient and scalable distributed stream processing infrastructure. This enabled processing moving image recognition tasks according to the amount of data transmitted.

Distributed Deep Neural Systems to artificially learn Behavioural Profiles through Embeddings

Leveraging the DAIMA as a "real time serverless" solution, we can focus on our driving challenge consisting on extracting or "learning"  behavioural profiles of our customers from real time data. In order to do this, we start by segmenting customers based on session duration as a performance metric related to session behaviour. We can use Weibull distribution parameters to characterise session behaviour [8]. This can be extended to find session duration, total-to-pay and pay-rate distributions segmented by corridors.

In order to evaluate the contribution of other sources of information, a powerful Machine Learning algorithm, known as Gradient Boosted Decision Trees, can be used to obtain the relative importance of several session-related features in the prediction of session duration.

Extending this multi-factor approach to characterise customer behaviour, Neural Networks can be trained to predict session duration with data fed in micro-batches. Ongoing work is focused on leveraging the power of a DAIMA to implement Distributed Deep Learning in the task of obtaining behavioural embeddings. This is a continuation of the work presented in [9].


We have described the foundational elements of a system to learn real time behavioural profiles of customers leveraging a Dynamic AI Microservice Architecture or DAIMA. Ongoing work is focused on integrating a Distributed Deep Neural System to artificially learn the desired representations in real time.



[2] Advantages of an event driven architecture pattern

[3] Architectural considerations for event-driven microservices-based systems

[4] Web services vs. streaming for real-time machine learning endpoints

[5]  Bahadori, Kiyana & Vardanega, Tullio. (2018). Designing and Implementing Elastically Scalable Services - A State-of-the-art Technology Review. 557-564.

[6] Herbst, Nikolas et al. (2013). Elasticity in cloud computing: What it is, and what it is not. International Conference on Autonomic Computing. 23-27.

[7] Kato, Kasumi et al. “Construction Scheme of a Scalable Distributed Stream Processing Infrastructure Using Ray and Apache Kafka.” CATA (2019).

[8] Liu, Chao & White, Ryen & Dumais, Susan. (2010). Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 379-386. 10.1145/1835449.1835513.

[9] Machine Learning for Actionable Behavioural Clustering at WorldRemit

[10] Machine learning is going real-time

[11] Chamberlain, Benjamin & Cardoso, Ângelo & Liu, C.H. & Pagliari, Roberto & Deisenroth, Marc. (2017). Customer Lifetime Value Prediction Using Embeddings. 1753-1762.