Hybrid nano-services and the rise of functional server-less dynamic architectures
The Search for optimisation in highly connected networks.
I’m going to assume that the reader is already familiar with micro-services and the case against monoliths so I won’t reproduce that here. I will also be glossing over server-less computing for the sake of brevity.
What is a nano-service?
Nano-services can be thought of as almost a single route in a typical web server. It is a hyper-focused piece of code that does one thing and is exposed via a single interface. We then link multiple nano-services together to extend their functionality. Nano-services being small and specialised allow us to easily define all necessary dependencies and architecture needed, in isolation, to accomplish their intended goal and thus tend to be easy to deploy because of their explicit and simple nature.
The benefits of nano-services allow multiple disparate teams to work in parallel to write them, needing only an agreed-upon contract that defines the inputs and outputs of a nano-service in order to use it. Developers can then chain these calls together, similar to how we pipe inputs and outputs in the terminal.
Nano-services naturally lend themselves to server-less and event-based architectures because of these characteristics.
The case against Micro-services.
Most developers at this point are already familiar with micro-services, small self-contained APIs that concern themselves with a small domain of business knowledge. Many among us already use Kubernetes with docker or some kind of managed virtualisation SaaS architecture like EKS or Fargate from AWS.
For most of us, we don’t need to split this any further and in fact, sometimes it actually becomes less efficient to do so. As one splits an API into smaller and smaller pieces one introduces delays because of the constant moving between the interface boundaries of the APIs. We start to realize that our latency starts increasing exponentially. With each new interconnected micro-service that we add, we also end up slowing down that downstream sub-network as dependencies need to call other child dependencies over the network.
While we can scale up the number of instances in a cluster to handle throughput, reducing our latency is not so easy to optimise from an architectural standpoint. While we can optimise individual endpoints and their code paths in our APIs, for reasonably well-written code the major contributing factor will be network IO.
In an event-based server-less system we also need to take into account that multiple endpoints might potentially have cold starts and each could introduce an additional delay as we wait for them to spin up.
Similar to parallel computing, being cognisant of our data dependencies is fundamental to further optimisation. Each call that we make into our network is restricted by the critical path. The critical path in our network call is the longest time it takes for information to be calculated.
In the above call, our originating vertex 0 has to wait for the longest path of 0 → 1 → 2 → 3 → 4 →3 → 2 →1 → 0 to be completed. (For simplicity’s sake we assume that each vertex takes the same time too compute and that each edge takes the same time to traverse). We also forget that each of these vertices requires resources to operate as well and that each vertex thus requires a base resource cost allocation in the network.
Proponents against nano-services use this as the argument against further subdivision of existing micro-services, in fact, nano-services are sometimes considered an anti-pattern (https://dzone.com/articles/soa-anti-pattern-nano-services), however, nano-services have a useful property.
Let's assume that the above vertices in our graph are lambda functions that comprise the nano-services. What stops us from deploying the functions together as an aggregated endpoint? Assuming we can treat each vertex as a single composable piece, we can then use a deploy/provisioning strategy that deploys it as a single piece. For homogeneous software, this is significantly easier to do. Isolated NodeJs projects can be chained together via an auto-generated linking script and repeated libraries that would be deployed at each node in a typical network can now be aggregated together and declared only once, minimising resource usage. Thus we exhibit clustering behaviour where we bring our dependencies closer towards us.
We can and should be taking advantage of new services that allow real-time monitoring of message packets within our architecture. We can analyse these logs to identify potential optimisation vectors in our network by tracing network pathing via correlation IDs and tag sub-optimal routes for potential improvements. This process can then be automated to allow the Business Intelligence division within your corporation to dynamically re-provision your network in real-time to constantly optimise without continuous programmer intervention.
Testing such optimisation could be done via A/B testing or in a reproduced sandbox of your prod stage that requires manual approval to elevate.
Testing and Dependencies
Due to the simple nature of lambda nano-services, testing should be easy to accomplish for a single developer or small team. Lambdas can be considered to be a small unit of work that can be rapidly conceived ad hoc as needed and added to the business’s software ecosystem.
At this point, we have completely rewritten our architecture to allow for dynamic clustering and reorganisation via run-time architectural optimisations, but we still are not happy with our performance. Despite using clustering we still find that network IO and waiting for asynchronous dependencies within our network are wasting clock cycles.
A typical problem in server-less computing would be as follows: We need to call a number of external dependencies and once they are all complete we can continue with execution. Waiting for these calls to complete in monoliths or micro-services is generally not inefficient as the system is most likely handling multiple calls at the same time, and can thus do something while it waits for IO to complete.
In server-less architecture however it is a problem since we typically are not multiplexing calls in the same lambda instance. Once we have called all necessary endpoints we waste cycles while we wait for them to complete. For simple calls, the waiting period is generally negligible as your lambda functions are usually deployed in close proximity to excessive network traversal, but as your architecture evolves, this time will grow alongside it. Borrowing techniques from functional and parallel programming we can use memoization and barriers.
Our function can call the necessary external endpoints, and request that their response be redirected to a synchronisation service that can handle waiting for the responses to be collated, whilst having the calling lambda store its current context as a memoized snapshot. Once the barrier has been resolved with all calls completing, the saved context can be regenerated with the result and continue execution.
After memoization has occurred the lambda function can safely shut down to conserve cycles while it waits. Logs and analysis of this information can be used by your BI division as well. The synchronisation service can identify where slowdowns are occurring based on which calls contribute to the highest wait time to your barriers. Excessive wait times can be used as a trigger to your network to request to be optimised.
The next stage
With the increase in AI research, the next logical step for most business software architectures is the ability for the network itself to alter itself while remaining efficient and profitable to the business. This requires a flexible collection of divisible pieces that can be analysed and manipulated at each stage of composition from its smallest constituent parts.