A fairly widespread realization that open source is lagging behind proprietary software when it comes to cloud computing, specifically in the operation of the cloud, has fostered several things. One, the Red Hat Office of the CTO started an initiative to work towards making operations as fundamental as functionality in the upstreams they contribute to and the products they produce. Likewise the concept has been exposed and embraced by the open source cloud community. Another is that consuming public cloud is not always cost effective and standing up an open source alternative presents many challenges from the skill sets needed to run them and the lack of an open source community to address complex operational issues, which leaves potential operators on their own. Yet another is the creation of OpenInfra Labs by the OpenStack Foundation (OSF) to foster an upstream geared toward open source cloud operations (Note that the scope neither requires nor is limited to Openstack specifically, it is intended to be a place where any and all open source cloud projects are welcome). Still another is a realization that AI aided operations are needed to lower the barriers to open source cloud operations. These circumstances combine to provide an opportune moment to drive towards bringing the power of open source to address the functionality of cloud operations. The Manifesto is intended to help foster an open community initiative to support, advance and coordinate existing and new efforts toward the ‘Operate First Principle’: the paradigm that, in the age of Cloud, operational excellence must be encapsulated in software projects and related services; and that open source must therefore consider operability of their projects in a realistic production cloud environment. Moreover, the operability of the resulting clouds, which typically incorporate many projects and/or products, must be understood and engineered to provide a competitive experience for operators wanting to use open source clouds as a reasonable alternative to proprietary clouds. While we continue to address operability within various community projects (such as Kubernetes Operators) the lack of an upstream open source community to advance overall cloud operational considerations is viewed as a threat to open source’s relevance in the cloud.
Why?
Imagine a company building airplanes. Not only do they need to design, build and test each component and subsystem individually and test them together as a whole, they also need to employ pilots to actually fly the airplane in real world conditions as a critical part of the engineering process. Feedback from pilots on the overall flight worthiness of an aircraft is critical for safety, but feedback on the piloting experience is also important. Do the operational controls, indicators and assists make the aircraft easier to operate or hinder operation?
For open source clouds we need to consider operational experience and drive toward providing capabilities that ease the complexity and lower the barrier to entry for both standing up and ongoing operation of open source clouds. Open source clouds need to address all the operational requirements, such as installation, upgrading, monitoring operation and use, allocation of resources, hybrid cloud / multi cloud considerations, security, compliance, etc. With Operate First we will leverage the experience from operational teams of existing open source clouds that handle real world use cases to provide operational input to projects where appropriate and sharing expertise and tooling that the community can benefit from. Having an upstream community dedicated to overall cloud operation provides an opportunity to leverage the power of open source to advance cloud operability. When we Operate First, operational knowledge —which has become as valuable as code itself— becomes the subject of collaboration between communities who use the services and those who operate them. Being able to actively manage the feedback loop between these communities will become an open source strength. Providing developers an environment, where they can execute and operate their software in a production cloud environment, opens the possibility to debug, inspect and observe their software in real-world scenarios.
Could cloud computing cause open source to lose to proprietary? Clouds are based on open source technology, but wrapped in non-open source proprietary software that provides an ease of use operational experience that pure open source does not compete with today. The barrier to entry for pure open source is thus very high. Operational and user experience expectations have been set by the public cloud vendors. It’s their lock in. Our efforts to make open source suitable for enterprise use, must now evolve to the new model for enterprise which utilizes cloud computing. We need to adapt to meet this challenge or risk diminishing open source’s value and see all the progress made to date become less significant. Cloud has changed the target for open source, by adding an operational capability that needs to be addressed.
Operate First needs to include operating the open source cloud at scale to ensure open source cloud operators expectations are properly set and met.
Key tenets
With Operate First we will utilize open source production cloud environments to allow developers to evaluate their functionality as well as operability, at scale with real world workloads. Developers can evaluate quality, performance and scale, using open source software in an open source production cloud environment, catching issues, ensuring scale and improving operations as we go. Drawing projects into the cloud will establish a direct link between cloud operators and open source projects to provide a feedback loop for identifying issues and resolving them. And where appropriate, ideally moving toward better utilization of Continuous Deployment to catch issues closer to the source, to improve response and drive improvements. We will push operability into a targeted upstream where a wider community can participate and drive an open source solution for cloud operation. Also we will strive to feedback operational consideration to various upstreams to ensure that operational considerations are built-in and robust for the technologies they nurture. We will measure our cloud operator experience against best in class proprietary competitors to drive toward a first class experience for our open source clouds. The environment for upstream communities to run their software will provide a “Best effort community SLA”.
How?
With Operate First, our community will open up operational knowledge to all open source communities to improve the integration and operability from the source. We need to establish a tight connection between our cloud operators and open source developers. We will look at existing efforts and find a consensus on a true open cloud operations model. For example, techniques from Site Reliability Engineering (SRE), GitOps and AIOps are expected to help shape the solutions. Whatever the ultimate model, the project will determine a workable path forward for continuous improvement toward achieving an Operate First methodology for open source. DevOps into a live production cloud for open source projects will go a long way to ensuring functionality as well as operability.
Resources such as the Mass Open Cloud will allow developer access for vetting their open source projects for both functional and operational aspects.