| Image by Markus Spiske via Unsplash Copyright-free
With the objective of extracting behavioural representations or profiles of our customers in real time, we are faced with the challenge of designing an AI Architecture that can respond reliably to rapid changes in demand for streaming data. This challenge is tackled with the use of a dynamic architecture comprising locally elastic AI Microservices. By dynamically matching resource supply and demand, we aim to reach the throughput and latency requirements of the target real time system. This "real time serverless" architecture facilitates the implementation of Distributed Deep Neural Systems to artificially "learn" the desired embeddings in real time.
The exploited Architecture of Dynamic AI Microservices or DAIMA has the following key elements and properties:
- Micro-services: Small, autonomous, self-contained.
- Massive Parallelisation: Computational elements and functions provided with high levels of parallelisation.
- Elasticity: Autonomous provisioning and de-provisioning of computational resources matching supply and demand.
- Event-driven: Operation driven by events that can be defined as time-stamped and immutable records of changes in transactional state.
- Resilience: The system recovers quickly if some of its components fail.
The DAIMA is supported on a Kubernetes substrate providing coarse microservice capabilities, while fine-grained control and elasticity are achieved relying on the Ray framework . Additionally, high levels of service autonomy and localised elasticity can be reached by adopting an event driven approach  where:
- Communication between services is performed asynchronously through an event bus and
- The synchronisation of tasks is achieved as in the form of a choreography  .
We are also embracing the idea of “streaming endpoints” to provide real-time AI-powered predictions from data with high levels of throughput .
As highlighted before, a critical objective is the design of a highly elastic DAIMA with the ability to rapidly adapt to load changes. This is particularly crucial in the development of real time systems that require maintaining high levels of throughput and low latency under changing load conditions. Additionally, as explained in , speed and precision are “qualifying traits” of elasticity and the synchronisation of resource provisioning and de-provisioning with demand variation leads to the target frugality.
An approach similar to the presented here is described in . There, the Ray framework was used in combination with Apache Kafka to implement a highly efficient and scalable distributed stream processing infrastructure. This enabled processing moving image recognition tasks according to the amount of data transmitted.
Distributed Deep Neural Systems to artificially learn Behavioural Profiles through Embeddings
Leveraging the DAIMA as a "real time serverless" solution, we can focus on our driving challenge consisting on extracting or "learning" behavioural profiles of our customers from real time data. In order to do this, we start by segmenting customers based on session duration as a performance metric related to session behaviour. We can use Weibull distribution parameters to characterise session behaviour . This can be extended to find session duration, total-to-pay and pay-rate distributions segmented by corridors.
In order to evaluate the contribution of other sources of information, a powerful Machine Learning algorithm, known as Gradient Boosted Decision Trees, can be used to obtain the relative importance of several session-related features in the prediction of session duration.
Extending this multi-factor approach to characterise customer behaviour, Neural Networks can be trained to predict session duration with data fed in micro-batches. Ongoing work is focused on leveraging the power of a DAIMA to implement Distributed Deep Learning in the task of obtaining behavioural embeddings. This is a continuation of the work presented in .
We have described the foundational elements of a system to learn real time behavioural profiles of customers leveraging a Dynamic AI Microservice Architecture or DAIMA. Ongoing work is focused on integrating a Distributed Deep Neural System to artificially learn the desired representations in real time.
 Advantages of an event driven architecture pattern https://developer.ibm.com/technologies/messaging/articles/advantages-of-an-event-driven-architecture/
 Architectural considerations for event-driven microservices-based systems https://developer.ibm.com/depmodels/microservices/articles/eda-and-microservices-architecture-best-practices/
 Web services vs. streaming for real-time machine learning endpoints https://towardsdatascience.com/web-services-vs-streaming-for-real-time-machine-learning-endpoints-c08054e2b18e
 Bahadori, Kiyana & Vardanega, Tullio. (2018). Designing and Implementing Elastically Scalable Services - A State-of-the-art Technology Review. 557-564.
 Herbst, Nikolas et al. (2013). Elasticity in cloud computing: What it is, and what it is not. International Conference on Autonomic Computing. 23-27.
 Kato, Kasumi et al. “Construction Scheme of a Scalable Distributed Stream Processing Infrastructure Using Ray and Apache Kafka.” CATA (2019).
 Liu, Chao & White, Ryen & Dumais, Susan. (2010). Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 379-386. 10.1145/1835449.1835513.
 Machine Learning for Actionable Behavioural Clustering at WorldRemit https://blog.worldremit.com/machine-learning-for-actionable-behavioural-clustering-at-world-remittitled/
 Machine learning is going real-time https://huyenchip.com/2020/12/27/real-time-machine-learning.html
 Chamberlain, Benjamin & Cardoso, Ângelo & Liu, C.H. & Pagliari, Roberto & Deisenroth, Marc. (2017). Customer Lifetime Value Prediction Using Embeddings. 1753-1762.