How Platform Engineering Builds your Path to Production

20 May

How Platform Engineering Builds your Path to Production

IB

Inigo Basterretxea

Well embedded patterns and workflows create a complex environment for Data & AI transformations within enterprises. These patterns – which their large and diverse workforces are familiar with and have proven to work – were created to accelerate new projects by building on past success and reduce risk by learning from past failure. There were likely good reasons for things to be that way.

‍

However, not evolving in an environment of rapid technological advancements risks not making the correct business decisions, so large enterprises may choose to turn to tech consultancies for help.

‍

Building PoCs is, frankly, relatively easy. They are built under highly controlled conditions, often isolated from production systems, and are usually not serving a large number of users who can provide immediate feedback by testing the application in unexpected ways.

‍

Moving them to production in an enterprise environment, however, is a completely different beast. That needs a thorough understanding of software systems, software delivery and cloud technologies: enter Platform Engineering. But what is it?

‍

Platform Engineering

‍

Data is stored somewhere, applications run somewhere. Traditionally, this would happen on hardware (servers) owned by companies themselves, but nowadays that somewhere is often a platform on the Cloud. Platforms host, run and enable new data and software products.

‍

Platform engineering involves building and maintaining the things that deploy and run a product's source code, so development teams can focus on developing. It enables everyone's work (Data Scientist, Machine Learning Engineers, Data Engineers, etc) to be delivered in a production environment to end-consumers.

‍

Unfortunately, platform engineering is often overlooked, as its value is not immediately visible or makes up non functional system components, even if any time you interact with an application or website along with thousands of other users, you are indirectly benefiting from it. Applications run on servers, they process data, and need to store it. Those resources need to be connected in secure ways, limiting risk to the owners and their users, but they also need to be developed quickly and run at a reasonable cost.

‍

More often than not, platforms will already exist and there will be existing layers that can be reused (e.g. networking to company systems) to decrease time-to-market in good enough ways. Therefore a person who can quickly grasp what the existing pattern is and how to most efficiently navigate it will have more chances of success. A platform engineering consultant combines both the technical and the soft skills required to ship software products to production.

‍

Recommended Delivery Approach

‍

At Mesh-AI, our team has a wealth of experience that informs our approach to successfully delivering outcomes within Data & AI. A pre-requisite for this is to have a clear vision of why we are here and what we are doing. A team’s first order of business is to read the Statement of Work to understand the mission and make sure the scope of the work is reasonable for the timelines specified. We follow this with a day or two of workshops for discovery, although this may be done prior to writing the Statement of Work to ensure it is clear and with good estimates on solution, team size, etc.

‍

The outcome of our work is removing pains from the customer’s team by providing software designed and built to fulfill requirements which benefit humans i.e. integrate data sources to allow people to answer key business questions, such as What is our current exposure and how do we reserve capital accordingly (in insurance). It is often tempting to build a feature-rich and sophisticated solution, but start simple; get something working and validate it with the end user whilst you’re building. Produce an MVP architecture for the solution within the first two weeks, get it reviewed and start building.

‍

During the build phase of a project we ask the customer what is their path to production, that is both the business processes required to deploy to Production and the physical process for doing so (e.g. existing automated deployment tooling).

‍

Understanding steps/gates to deploy to an environment is key to success, as often it is not the technical challenges but organisational processes that will cause delays. We seek out tech and key process owners to gain insights around what their concerns are. Most enterprises will often include the following on their path to production:

Architecture Review
Privacy Assessment
Security & Threat Model
Change Approval/Advisory Boards

‍

Start work towards passing gates to production early and find out how to expedite those processes, as some of them can sometimes take weeks to months in large enterprises. I have seen Security & Threat Models take 4 months to be completed and approved.

‍

Show you have listened, understood their concerns and exercised diligence to meet their expectations. Giving respect and showing care towards a specialist’s domain can stand you in good stead. People will be more likely to share their time and understanding with you to help navigate processes in an unfamiliar environment - it can turn a check list process into a genuine conversation and help to address any gaps in approval submissions.

‍

Dealing with Blockers

‍

However, no matter how diligent you are in your approach to discovery and delivery (both for technical and of business processes), there might be unforeseen events that cause delays to projects.

‍

In such instances,the first thing to do is to acknowledge the problem as early as it is identified (e.g. missing necessary developer permissions) and raise it with the rest of the team to find domain experts or owners.

‍

In the meantime, finding who runs the system is the best course of action. This can sometimes be done by the network of relationships you have built, searching through internal documentation (e.g. Confluence) or using the code repository history (e.g. using Git Blame). When interacting with system owners, one should talk reasonably, assuming you could be wrong as you do not know the history of why a system is set up how it is. Here is when excellent engineers could provide alternative solutions or workarounds. It is fine to identify a problem, but it is even better to also propose a solution.

‍

If it becomes apparent that blockers will cause substantial delays, it is time to engage sponsors or senior stakeholders. You need to set the expectation of how delivery timelines may be delayed and ask if they can assist by connecting you with the relevant domain or process owners, or apply pressure when needed to accelerate the resolution of dependencies.

‍

After resolving the issue, report back to the team and stakeholders, document and share the solution, so others can benefit from your findings.

‍

Software Delivery

‍

As mentioned earlier, our deliverables are mainly composed of software. As engineers we pride ourselves in writing and delivering good software. In this regard, some non-negotiable conditions are that it must be:

Written as code: so it can be consistently deployed across multiple environments and tested.
Deployed through automation: to reduce risk as humans are prone to making mistakes.
Observable: monitoring and alerting are in place for when things break so they are identified, troubleshooted and resolved.
Iterative, not big bang nor complex: premature optimisation can cause delays or reduce flexibility to accommodate changing requirements. In addition, frequent releases reduce failure rates and the need to roll back multiple features in a release, boost developer morale, and produce rapid and incremental value to the consumer.
Well documented for decisions, architecture and operation: as consultants we are not often involved in the medium to long-term operation of our solutions, so this is a core component of our work. New joiners to the project should be able to quickly learn the background, how to maintain and evolve it.
Taking into account the tech environment it is built in: pick tools the internal teams are already familiar with or could adopt with ease, do your developers know Python and your end-users SQL? Choose the best tool for the job that many can use.
Including Disaster Recovery mechanisms: what if a user or a developer accidentally drops a table or there is a service outage? Simulations should also be run to verify they work.
The volume and cost of running the solution should be modelled before starting the build phase foreseeing a 10-100X volume in data/user traffic. This is to ensure it can last for years to come with an increase in workload as well as the price is appropriate for the platform. A 100X in volume shouldn’t increase your operational costs by 100X; choose appropriate tooling and understand the cost structure.

‍

Minimising Risk and Moving to Production

‍

As tech consultants, we don't sell a tangible product, we offer specialised expertise accumulated through repeatedly solving similar problems within different environments. If we can't do this in a timely manner and at a cost that can be justified with the expected ROI, we have failed at our job.

‍

In a data enriched world full of AI opportunities, this ROI can only be realised if data and software are in the hands of real people, to make decisions and serve a business purpose.

‍

Information is power and with it comes great responsibility. This is why businesses in highly regulated industries have thorough internal mechanisms that exist to minimise the risk involved in managing valuable data in Production.

‍

A good platform engineering consultant shares its client's concerns, understands the framework that solutions need to be delivered within and enables the customer to achieve its business goals. If you're not in production after six months, something has gone wrong.

‍

Latest Stories

Four Enterprise AI Trends - and What They Mean for Your Organisation

18 Feb

Turning Enterprise AI Ambition into Business Outcomes

28 Jan

AI in Energy: From Experimentation to Strategic Imperative

11 Dec

I would like to receive marketing communications regarding Mesh-AI news, services and events.

You may unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review Privacy Policy.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.