<feed xml:base='/feed/OpenStack/atom.xml' xmlns='http://www.w3.org/2005/Atom'><id>http://blog.hendrikvolkmer.de/</id><title type='text'>Refactoring life. Mercilessly. &gt;&gt; OpenStack</title><updated>2013-04-12T19:36:21+02:00</updated><link href='/feed/OpenStack/atom.xml' rel='self'/><link href='http://blog.hendrikvolkmer.de/' rel='alternate'/><author><name>Hendrik Volkmer</name></author><entry><id>/2013/04/12/there-will-be-no-reliable-cloud-part-3</id><title type='text'>There will be no reliable cloud (part 3)</title><content type='html'>&lt;h2&gt;How I stopped worrying and love the cloud&lt;/h2&gt;




&lt;p&gt;If you have read &lt;a href=&quot;http://blog.hendrikvolkmer.de/2013/04/03/there-will-be-no-reliable-cloud-part-1/&quot;&gt;part 1&lt;/a&gt; and &lt;a href=&quot;http://blog.hendrikvolkmer.de/2013/04/09/there-will-be-no-reliable-cloud-part-2/&quot;&gt;part 2&lt;/a&gt; you may thing all hope is lost and this cloud cannot work at all. However, there is actually a way to create fault tolerant, resilient applications on top of clouds. Actually if the application is created in a distributed, fault tolerant and resilient fashion, it will have no problem at all to run on a cloud infrastructure.&lt;/p&gt;




&lt;p&gt;As mentioned in part 2: The main thing to get away from, when thinking about this, is the idea of &quot;a server needs to be fault tolerant&quot;. Forget the server. It&apos;s about the end service. If your body is a few years old (which it is, if you can read this), I guess that not a single cell of your body is the same as when you where born. Yet, you are still alive. That&apos;s the kind of thinking that has to be applied to a service in a cloud setting.&lt;/p&gt;




&lt;h2&gt;Green field cloud native apps&lt;/h2&gt;




&lt;p&gt;This behaviour is actually not that hard to achieve and there is a lot of documentation about this. There&apos;s a &lt;a href=&quot;http://d36cz9buwru1tt.cloudfront.net/AWS_Building_Fault_Tolerant_Applications.pdf&quot;&gt;AWS Whitepaper&lt;/a&gt; which is AWS specific but most of the ideas apply to any cloud setup. Also Netflix share a lot of their ideas and implementations of their architecture &lt;a href=&quot;http://www.slideshare.net/netflix&quot;&gt;on Sildeshare&lt;/a&gt; and &lt;a href=&quot;https://github.com/Netflix/&quot;&gt;on github&lt;/a&gt;, one very recent and very good talk is this &lt;a href=&quot;http://www.infoq.com/presentations/Netflix-Architecture&quot;&gt;Netflix Architecture&lt;/a&gt; talk.&lt;/p&gt;




&lt;p&gt;The main idea is actually pretty simple:&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;Identify and tear apart statless and stateful parts of the application&lt;/li&gt;
&lt;li&gt;make the stateful parts redundant using real, distributed data stores (as mentioned in &lt;a href=&quot;http://blog.hendrikvolkmer.de/2013/04/03/there-will-be-no-reliable-cloud-part-1/&quot;&gt;part 1&lt;/a&gt;: Riak, Cassandra, Mysql Galera, etc.)&lt;/li&gt;
&lt;li&gt;Make sure that the data store parts are distributed across failure domains (e.g. Availability Zones, Regions, etc.)&lt;/li&gt;
&lt;li&gt;Make sure that the dependencies of your system are known to you and designed in a way that reduce the likelihood and impact of failure&lt;/li&gt;
&lt;li&gt;Using a &lt;a href=&quot;http://www.infoq.com/presentations/Micro-Services&quot;&gt;Micro-Services&lt;/a&gt; approach will help you to get these dependencies to be explicit and it let&apos;s you scale the individual parts independently as needed. You can also make the best fault behaviour decisions on a very fine granular level, so that a degraded operation and partial failure is not a big problem&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;&lt;blockquote&gt;&lt;p&gt;Netflix for example uses Cassandra as their data store. That does not mean that they have one huge Cassandra cluster spanning 100s of nodes. They have 100s of Cassandra clusters with a few (about 5-50) nodes that are completely independent and each service has their own cluster (which is distributed over AZ or regions as needed)&lt;/p&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;




&lt;p&gt;One interesting part about the distributed data store part is that failure or hand-over of data or requests is a totally normal part of the day to day operation of the system. So adding a new node will lead to rebalancing, as would a failure of a node. A system that does these things all the time will work better than some kind of hot-standby system that will &quot;fail-over&quot; when something fails which maybe happen only once in a while.&lt;/p&gt;




&lt;p&gt;While this kind of application design puts some kind of burden on the developer to deal with all these failure conditions that have been abstracted way (or to put to more realistic: ignored) through &quot;highly available backends&quot;, the running application is the best place to decide what to do from a business logic point of view, if some kind of failure occurs. For example, if your real time web chat system does not work at the moment, you could just sent the message in an asynchronous fashion via the non-real time part of the system (IIRC Facebook does it that way). You can hardly put this kind of logic into your HA-failover scripts.&lt;/p&gt;




&lt;p&gt;Another great talk by Michael Nygard about &lt;a href=&quot;http://www.infoq.com/presentations/Stability-Anti-patterns-Michael-Nygard&quot;&gt;Stability Anti-Patterns&lt;/a&gt; shows that design for failure is inevitable for the kind of environment were deploying applications in today. And this isn&apos;t even cloud specific!&lt;/p&gt;




&lt;p&gt;Even though complexity is not mentioned explicitly, the &quot;reduce integration points&quot;-message is exactly that: Reduce complexity!&lt;/p&gt;




&lt;blockquote&gt;&lt;blockquote&gt;&lt;p&gt;The video also contains a fun anecdote about TCP (and how firewalls violate it). Now keep that example in mind regarding the cloud analogy of TCP vs. UDP and think about how this situation would have been different if UDP had been used and the retransmit etc. logic would have been at the application layer…&lt;/p&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;




&lt;p&gt;If you want to make your application more reliable it is actually a good idea to poke it all the time and try to make it fail. This is basically what Netflix does with their chaos monkey approach. The idea behind this is &lt;a href=&quot;http://en.wikipedia.org/wiki/Antifragile:_Things_That_Gain_from_Disorder&quot;&gt;Antifragility&lt;/a&gt;. Through the interaction of your software system with the developers and the feedback you get from failures, you can basically turn the complex system that consists only of the software application into a &lt;a href=&quot;http://en.wikipedia.org/wiki/Complex_adaptive_system&quot;&gt;complex adaptive system&lt;/a&gt; (consisting of the running(!) software and the interactions of the users and operators), that can respond to change and become more resilient.&lt;/p&gt;




&lt;h2&gt;Other benefits of apps in the cloud that help with reliability&lt;/h2&gt;




&lt;p&gt;.. and don&apos;t need any special reliable backend. Using a cloud setup you can basically test and deploy in ways that are not - easily - possible in a traditional world (like &lt;a href=&quot;http://martinfowler.com/bliki/BlueGreenDeployment.html&quot;&gt;Blue/Green Deployment&lt;/a&gt;), spinning up 100s or 1000s of instances to do load and failure testing. And you can do that with up to 100% of the same environment (architecturewise, operating system, number of services etc.) as production.&lt;/p&gt;




&lt;h2&gt;But I have a legacy system and want to put it into the cloud&lt;/h2&gt;




&lt;p&gt;The main thing to remember is not to confuse &quot;cloud instance&quot; with &quot;server&quot;. So if you need to roll your classic HA-DB Setup in the cloud, do not put it in one Avaibility zone on two instances. Use to zones that are guaranteed to be independent.&lt;/p&gt;




&lt;p&gt;There is actually a good overview - although very old school in nature (Who would have thought - it&apos;s from Oracle!) - &lt;a href=&quot;http://www.infoq.com/news/2013/03/MySQL-Reference-Architectures&quot;&gt;Mysql Reference Architectures&lt;/a&gt; if you choose the failure domains as communicated by the cloud provider there should not be a big problem really.&lt;/p&gt;




&lt;p&gt;If your cloud provider (or internal cloud) cannot do that, you can still plan to reduce your MTTR. This is a very good idea anyway! I&apos;m always surprised how little thought seems to be put into reducing MTTR. &quot;We have a HA system, so we&apos;re good&quot;. No! Think about what happens after failure and about the impact! You cannot predict the probabilities but you can predict impact of failures pretty well!&lt;/p&gt;




&lt;p&gt;Of course you can &lt;a href=&quot;http://gigaom.com/2013/01/14/resiliency-and-reliability-the-devil-is-in-the-detail/&quot;&gt;split up workloads in different cloud environments&lt;/a&gt; - latency might become a problem. So YMMV.&lt;/p&gt;




&lt;p&gt;Another problem with non cloud-native apps that run on more than one server at some point most of them use something like NFS as a shared file-based datastore. Of course you can run NFS on some cloud instance, but then NFS itself is not really distributed and prone to catastrophic failure. Also, the filesystem abstraction is really broken over the network. You might get away with it, if you know and control the network itself. In a large scale cloud network... Not the best idea (failure semantics for file systems are really very different than network protocols...).&lt;/p&gt;




&lt;p&gt;So if you want to run this kind of applications in the cloud, you have to think about the tradeoffs: Maybe you don&apos;t really need a shared-filesystem backend. Maybe you can easily change the application to use HTTP-based Object-Storage. Or you could run some distributed NFS-like replacement such as GlusterFS or xtreemfs on top of cloud infrastructure (again: use more then one zone). As always: it depends.&lt;/p&gt;




&lt;h2&gt;Further reading:&lt;/h2&gt;




&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.12factor.net/&quot;&gt;Good overview of a modern web app approach&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://infoq.com&quot;&gt;Lot&apos;s of good talks&lt;/a&gt; - Look around a bit, almost every relevant topic is covered&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.slideshare.net/netflix&quot;&gt;Technical talks by Netflix&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://radar.oreilly.com/2012/04/complexity-vs-simplicity.html&quot;&gt;Complex vs. Simple Storage systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Antifragility (Read and understand it)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.infoq.com/presentations/Agile-Theory&quot;&gt;Agile Theory&lt;/a&gt; - Although the title is &quot;agile&quot;, it&apos;s really about complexity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;Closing thoughts&lt;/h2&gt;




&lt;p&gt;This is certainly not all I have to say about this topic (expect more posts ...), but instead of collecting more and more links and ideas, I wanted to get this out there to fuel the discussion about these ideas. I think &quot;the cloud&quot; as in &quot;compute infrastructure&quot; is mostly still misunderstood. It seems to be something we have had for a long time. It&apos;s &quot;just VMs in the internet behind an API&quot; or &quot;Virtualization 2.0&quot;. It&apos;s not. It&apos;s so much more! It&apos;s different. If you think differently about some things...&lt;/p&gt;



</content><updated>2013-04-12T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2013/04/12/there-will-be-no-reliable-cloud-part-3/' rel='alternate'/></entry><entry><id>/2013/04/09/there-will-be-no-reliable-cloud-part-2</id><title type='text'>There will be no reliable cloud (part 2)</title><content type='html'>&lt;p&gt;In while the &lt;a href=&quot;http://blog.hendrikvolkmer.de/2013/04/03/there-will-be-no-reliable-cloud-part-1/&quot;&gt;first part&lt;/a&gt; was more basic information and technical, the second part will be about why I think it is impossible and not viable business wise to aim for high availability of the cloud in the infrastructure layer. Part 3 will then go into why this actually isn&apos;t that bad an we can still use this &quot;crappy infrastructure&quot; to build systems that are available to the end user.&lt;/p&gt;




&lt;p&gt;I just want to make the following point:&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Complexity + Scale =&gt; Reduced Reliability + Increased Chance of catastrophic failures&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;This is all this post is about.&lt;/p&gt;




&lt;h2&gt;Complexity&lt;/h2&gt;




&lt;p&gt;Almost any software system is a complex system. A software system that has networking components certainly is complex and if you put it on a cloud infrastructure, I think there is no argument that this system is not complex.&lt;/p&gt;




&lt;p&gt;&lt;a href=&quot;http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf&quot;&gt;Complex systems fail in certain ways&lt;/a&gt;. If they are poorly designed these failures are catastrophic, meaning the whole system is taken down. In terms of the cloud this means: All software systems that run on the infrastructure are not available any more and may not recover ever.&lt;/p&gt;




&lt;p&gt;This is very abstract, so where does complexity show in a cloud infrastructure?&lt;/p&gt;




&lt;h3&gt;Failure domains&lt;/h3&gt;




&lt;p&gt;Let&apos;s look at a basic cloud infrastructure setup (say a compute infrastructure implemented with OpenStack nova):&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2013/04/09/cloud-controller.png&quot;  alt=&quot;Controller deployments&quot; /&gt;&lt;/p&gt;




&lt;p&gt;We have a controller and four compute nodes (Hypervisors). Now the controller is certainly a Single point of failure, so let&apos;s add another one (HA!). You can immediately see how this increased the complexity! And if the failover fails you gained nothing.&lt;/p&gt;




&lt;p&gt;Compare that with the approach of - instead of adding another HA-style controller - creating another zone which itself is as simple (and unreliable) as the basic example.&lt;/p&gt;




&lt;p&gt;If you just look at one zone it is unreliable and does not seem to make sense. But if you look at the whole system consisting of two zones you can actually see that you gained something: You have two independent(!) systems with known failure semantics: Failure of one controller in zone 1 will never lead to a failure of zone 2.&lt;/p&gt;




&lt;p&gt;Of course you have to build your system in a way to make use of this. This is topic of part 3.&lt;/p&gt;




&lt;p&gt;Why is this approach useful? Remember: Almost any public cloud provider takes this approach. And please don&apos;t start to argue that one controller per zone is not a good idea and you still need HA for that. This is a pattern. Of course you should make the controller in a zone as reliable as possible - taking into account the trade-offs of zone failure and costs.&lt;/p&gt;




&lt;blockquote&gt;&lt;p&gt;At the controller level you can actually make that piece of the infrastructure more reliable with very small costs. In the OpenStack case: Just add a MySQL galera cluster, &lt;a href=&quot;http://www.pureftpd.org/project/ucarp&quot;&gt;UCARP&lt;/a&gt; virtual IP in front and you&apos;re basically done. 1 changed component and 1 additional component is a small price to pay for the gain.&lt;/p&gt;&lt;/blockquote&gt;




&lt;p&gt;The main idea here is about the &quot;failure domain&quot; pattern which splits and reduces complexity instead of increasing it using dependent systems like HA-pairs.&lt;/p&gt;




&lt;h3&gt;Some math - or not&lt;/h3&gt;




&lt;p&gt;Ever since statistics class in school, I&apos;ve had the gut feeling that using probabilities in the real world is somehow wrong. I was thrilled &lt;a href=&quot;http://www.amazon.com/The-Black-Swan-Improbable-Robustness/dp/081297381X/&quot;&gt;to read&lt;/a&gt; my feeling was right. There is actually a good video about the whole topic by Michael Nygard &lt;a href=&quot;http://www.infoq.com/presentations/Reliability-Engineering-Matters-Except-When-It-Doesnt&quot;&gt;Reliablity engineering matters except when it doesn&apos;t&lt;/a&gt;. Please watch the whole thing or don&apos;t watch it at all. If you stop half way, you&apos;ll get everything wrong.&lt;/p&gt;




&lt;blockquote&gt;&lt;blockquote&gt;&lt;p&gt;&quot;It is often the failover mechanisms themselves that generate the failure&quot; - Michael Nygard @ 20:20 in &quot;Reliablity engineering matters except when it doesn&apos;t&quot;&lt;/p&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;




&lt;p&gt;The basic heuristic for reliability of systems is:&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The higher the number of dependent components =&gt; the lower the overall availability and the bigger the impact of failure&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;And this is not a linear dependency. As soon as you introduce a dependency like a central service of any kind (think network filesystem, SAN, the network itself) you add a dependency.&lt;/p&gt;




&lt;p&gt;So if you consider this and now think of a cloud system that uses say 1000 compute nodes which, for normal operation, could be rather independent. Then you add some crazy HA-failover logic to your cloud management software that is supposed to fail over VMs via live migration from any node to any node. Well, congratulations, you just &lt;em&gt;decreased&lt;/em&gt; the reliability of your overall system and &lt;em&gt;increased&lt;/em&gt; the risk and impact of failure (total system failure!) by orders of magnitude because you just tied all compute nodes to each other.&lt;/p&gt;




&lt;p&gt;An example that shows up again and again are failures of AWS EBS as a common source of AWS AZ failure, &lt;a href=&quot;http://www.forbes.com/sites/kellyclay/2012/06/30/aws-power-outage-questions-reliability-of-public-cloud/&quot;&gt;like the one around christmas last year&lt;/a&gt;. The Elastic LB had a dependency on EBS. EBS fails. Now the whole region is down.&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2013/04/09/central-fs.png&quot; alt=&quot;Central FS dependency - Can you spot the problem?&quot; /&gt;&lt;/p&gt;




&lt;p&gt;I actually did not provide any calculation here - because I think you can only miscalculate here and come to the conclusion that &quot;it&apos;s actually not that bad&quot;. It is important to get the  connections and implications between dependencies, complexity and how scale increases the impact and likelihood of failure (even if you cannot say by how much).&lt;/p&gt;




&lt;p&gt;Another thing to consider when using the math-toolkit: Most of the reliability calculations that are used, come from mechanical engineering where things actually behave in a somewhat predictable manner. When you&apos;re dealing with just bare metal servers this may apply. If you add virtualisation you now have software added to the stack. And software failure characteristics are very different from hardware.&lt;/p&gt;




&lt;p&gt;Also complexity and non-independent system change the calculations drastically - and you still have to predict probabilities. System failures are no dice rolls!&lt;/p&gt;




&lt;p&gt;If you&apos;re interested in the math (and where it works and where it doesn&apos;t), watch the video mentioned above and read &lt;a href=&quot;http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1850428&quot;&gt;this paper&lt;/a&gt; if you think you can somehow predict stuff in a real, complex system.&lt;/p&gt;




&lt;h3&gt;Problems with failures in big complex systems&lt;/h3&gt;




&lt;p&gt;Another problem with complexity + scale is the type of failures you get. Most of the failures start local and small but then turn into cascading failures like this &lt;a href=&quot;http://googleappengine.blogspot.de/2012/10/about-todays-app-engine-outage.html&quot;&gt;Google App Engine failure&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;There are many ways this can happen and a few ways you can protect against this - some of which are shown in &lt;a href=&quot;http://www.amazon.de/Release-It-Production-Ready-Pragmatic-Programmers/dp/0978739213/&quot;&gt;Release It!&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The general approach here is: make failure as local as possible and if something starts to go wrong, make it fail hard and small and contain the failure. Again: Defined small, failure domains and independent systems.&lt;/p&gt;




&lt;p&gt;If you have a lot of dependencies in the infrastructure you&apos;re not only increasing the likelihood of failure but also make recovery harder. The &lt;a href=&quot;http://en.wikipedia.org/wiki/Thundering_herd_problem&quot;&gt;Thundering herd problem&lt;/a&gt; comes to mind.&lt;a href=&quot;https://aws.amazon.com/message/680342/&quot;&gt;AWS EBS&lt;/a&gt; has been hit by this more than once.&lt;/p&gt;




&lt;h2&gt;The importance of partial failure&lt;/h2&gt;




&lt;p&gt;Even though the system design that is communicated to the outside is &quot;expect one complete cloud zone failure&quot; it is important to design the system so that it can actually fail partially and work in a degraded mode. So to have small failure domains (e.g. compute nodes) that fail in a predictable manner (e.g completely and only in hardware) helps with that. So a degraded operation might be: 20 of 100 nodes failed but the other 80 are fine and don&apos;t care about the failure.&lt;/p&gt;




&lt;blockquote&gt;&lt;blockquote&gt;&lt;p&gt;One of my favourite examples of partial failure/degraded operation is the Airbus flight control software with its different &lt;a href=&quot;http://en.wikipedia.org/wiki/Flight_control_modes_(electronic)#Flight_control_laws_.28Airbus.29&quot;&gt;flight control laws&lt;/a&gt;. A lot of stuff can fail and the plane is still maneuverable. As far as I know nobody ever died from a failure of this system.&lt;/p&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;




&lt;h2&gt;You have to design for failure anyway&lt;/h2&gt;




&lt;p&gt;There&apos;s a great post about &lt;a href=&quot;http://it20.info/2011/04/tcp-clouds-udp-clouds-design-for-fail-and-aws/&quot;&gt;UDP clouds vs. TCP clouds&lt;/a&gt; clouds. I like the analogy. I disagree with Massimo&apos;s conclusion: He puts it as if with TCP you&apos;d never really expect a &quot;connection reset&quot; or anything like that because TCP is reliable. The truth is: You have to design for failure anyway! It does not matter how reliable your underlying infrastructure is. And you can actually implement some very performant and reliable applications on top of UDP… ask some of the IT guys at Wall Street.&lt;/p&gt;




&lt;h2&gt;Business side of things&lt;/h2&gt;




&lt;p&gt;There are also costs to provide reliability at cloud scale. One of the best papers on this topic certainly is &lt;a href=&quot;http://research.google.com/pubs/pub35290.html&quot;&gt;The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines&lt;/a&gt;. It basically comes to the conclusion that &lt;a href=&quot;http://samj.net/2012/03/simplifying-cloud-reliability.html&quot;&gt;Software reliability is cheaper than hardware reliability at scale&lt;/a&gt; because the additional costs of a software deployment are basically zero.&lt;/p&gt;




&lt;p&gt;One thing to consider here is also this scenario: Most web scale systems consist of a large stateless part and a small stateful part. The stateless part is easily scalable via scale out and a single component does not have to be reliable. Using this setup, it does not make sense to host the stateless part of the system on a highly reliable cloud system (if we pretend that it exists). You don&apos;t want to cast pearls before swine, do you?&lt;/p&gt;




&lt;p&gt;The main idea of a compute cloud is to be a general purpose compute environment. To make it as flexible and cheap as possible for customers to use, it does not make sense to provide a super reliable infrastructure.&lt;/p&gt;




&lt;h2&gt;Now what?&lt;/h2&gt;




&lt;p&gt;I hopefully showed why a single compute node or zone of the cloud will never be reliable. This &quot;reliable cloud&quot; problem really only exists if you think about compute instances in the cloud of &quot;physical servers&quot;. &lt;a href=&quot;http://www.jamiebegin.com/why-an-ec2-instance-isnt-a-server/&quot;&gt;They really aren&apos;t servers&lt;/a&gt;. So the real question really becomes: WHAT does have to be reliable and does it have to be a certain part of the infrastructure? Everybody seems to have accepted that hard disks will fail all the time and we find ways to design around it. I think the same is true for &quot;the server&quot; in a cloud setting or better &quot;the cloud compute instance&quot;.&lt;/p&gt;




&lt;p&gt;So if we accept this reality and move on, it turns out we can actually build reliable applications on top of such infrastructure. Mission critical apps in the cloud? Well, &lt;a href=&quot;http://www.infoq.com/presentations/Keynote-MythBusters-Cloud-Computing-NASA&quot;&gt;NASA does it&lt;/a&gt; and they don&apos;t seem to use their rocket scientists to solve that problem.&lt;/p&gt;




&lt;p&gt;More in part 3.&lt;/p&gt;



</content><updated>2013-04-09T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2013/04/09/there-will-be-no-reliable-cloud-part-2/' rel='alternate'/></entry><entry><id>/2013/04/03/there-will-be-no-reliable-cloud-part-1</id><title type='text'>There will be no reliable cloud (part 1)</title><content type='html'>&lt;p&gt;Stop wasting your time trying to find one. Stop wasting your time (and money) trying to build one. If you find a service provider that claims that they have it: Maybe question their understanding of cloud - and business.&lt;/p&gt;




&lt;p&gt;With all that free time, start to build reliable systems on top of unreliable clouds.&lt;/p&gt;




&lt;p&gt;After all these bold claims I&apos;ll convince you that this is a - some will say - sad but still valid fact of life.&lt;/p&gt;




&lt;p&gt;The main issue here is scale. Things (very generally) work very, very different at scale. And cloud infrastructures are all about scale. Keep in mind that complexity of systems does increase exponentially and thus the things that work fine with small systems might completely fail with bigger systems.&lt;/p&gt;




&lt;p&gt;Let&apos;s look at the different approaches to reliability that are out there and how they map to the cloud space. I start with the &quot;building blocks&quot; at the lowest layers and then move up to a whole cloud infrastructure (based in OpenStack) and some example services on top - because cloud for cloud&apos;s sake is a bit boring.&lt;/p&gt;




&lt;h2&gt;High Avaiblity vs. Service resiliency&lt;/h2&gt;




&lt;p&gt;The &quot;HA&quot; term seems to be prevalent with current system design. You just add an &quot;HA&quot;-pair to your system and your safe. At least that&apos;s how vendors seem to pitch this kind of design.&lt;/p&gt;




&lt;p&gt;There is actually &lt;a href=&quot;http://engineering.cloudscaling.com/2013/03/service-resiliency-doesnt-always-mean-ha-or-cluster/&quot;&gt;a very good presentation on the matter&lt;/a&gt; which goes into the differences of HA vs. resiliency by Randy Bias and Dan Sneddon of Cloudscaling.&lt;/p&gt;




&lt;p&gt;I was in the audience at that presentation and was very excited to hear all the things that are problematic with the HA-pair-approach of doing things: HA-pairs fail in a very catastrophic way, they don&apos;t really scale (out), etc.&lt;/p&gt;




&lt;p&gt;And I was very disappointed to hear that the actual examples in the presentation were only about resiliency of stateless services.&lt;/p&gt;




&lt;p&gt;Why does this matter? Because making stateless services resilient/available is indeed not the domain of HA-pairs. It&apos;s the poster child of scale-out architectures like the Web or the internet routing backbone at layer L3.&lt;/p&gt;




&lt;blockquote&gt;&lt;blockquote&gt;&lt;p&gt;Side note: One property of resilient systems that surfaces here is the client knowledge of more than one endpoint. Think of multiple DNS entires for a domain that hosts a webpage. In the routing example this is not that obvious but if you look at IPv6 you can see multiple routing entries on the client side.&lt;/p&gt;&lt;/blockquote&gt;&lt;/blockquote&gt;




&lt;p&gt;So making stateless services resilient just means: replicate all the data and serve it from multiple endpoints and let the client know about multiple endpoints. There are several possibilities on how to do that from an architectural standpoint. Choose the one you like and you&apos;re done. Easy.&lt;/p&gt;




&lt;p&gt;The interesting part - which was left out of the presentation - is resilience of stateful services. And - while most services can actually be designed to be stateless - you have to store your data somewhere and be able to change it. Otherwise this whole information business would be kind of boring and useless.&lt;/p&gt;




&lt;p&gt;For &lt;a href=&quot;http://en.wikipedia.org/wiki/State_(computer_science&quot;&gt;stateful services&lt;/a&gt; you basically have two options to make it &quot;HA&quot;:&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;take a non-distributed base system and tuck HA on top&lt;/li&gt;
&lt;li&gt;take a distributed system and make the right tradeoffs&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Let&apos;s look at them in detail.&lt;/p&gt;




&lt;h3&gt;Non-distributed base system with HA on top&lt;/h3&gt;




&lt;p&gt;This is the classic &quot;HA&quot; case: Take some stateful service that is not distributed in itself like NFS (which is not distributed on the server side) or MySQL add some &lt;a href=&quot;http://www.linux-ha.org/wiki/Pacemaker&quot;&gt;Pacemaker&lt;/a&gt; magic with some &lt;a href=&quot;http://www.drbd.org/&quot;&gt;DRBD&lt;/a&gt; mixed in and you&apos;re good. Or miserable.&lt;/p&gt;




&lt;p&gt;If you look into the details, most of the time your basically cheating your way out of the &lt;a href=&quot;http://en.wikipedia.org/wiki/CAP_theorem&quot;&gt;CAP theorem&lt;/a&gt; by denying the existence of network partitions through a second network/heartbeat link.
Also these kinds of setups are &lt;em&gt;cause&lt;/em&gt; of failures more often than not. For example several github outages were caused by these kind of HA-failures: &lt;a href=&quot;https://github.com/blog/1261-github-availability-this-week&quot;&gt;Github Mysql failover failure&lt;/a&gt;, &lt;a href=&quot;https://github.com/blog/1364-downtime-last-saturday&quot;&gt;Github MLAG failure&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&quot;Cluster Software&quot; causes more system outtages than hardware failures or software bugs. (See &lt;a href=&quot;http://www.infoq.com/presentations/Event-Sourced-Architectures-for-High-Availability&quot;&gt;Martin Thompson&apos;s presentation on &quot;Event Sourced Architectures for High Availability&quot;&lt;/a&gt;. Around 7:30)&lt;/p&gt;




&lt;p&gt;Another thing to consider with the traditional approach: This approach in itself does only try to limit &lt;a href=&quot;http://en.wikipedia.org/wiki/MTBF&quot;&gt;MTBF&lt;/a&gt;. &lt;a href=&quot;http://en.wikipedia.org/wiki/Mean_time_to_recovery&quot;&gt;MTTR&lt;/a&gt; can be considered but this is much harder to do as - from the system design standpoint - the expected failures of such a system are catastrophic. They are catastrophic, because a system that is not designed to be distributed and then IS distributed can never take distributed failure conditions into account and the best thing that can happen in case of failure is complete failure. You don&apos;t want the two HA-heads failing only &quot;half&quot;. In this case one is enforced via &lt;a href=&quot;http://en.wikipedia.org/wiki/STONITH&quot;&gt;STONITH&lt;/a&gt; or if something goes wrong further: Failure of both heads would still be considered better than a split brain scenario.&lt;/p&gt;




&lt;h3&gt;Distributed System and the right tradeoffs&lt;/h3&gt;




&lt;p&gt;In a distributed system the components of the system have some knowledge about the &quot;distributed-ness&quot; of the whole system and can therefore accommodate certain kinds of failures. Depending on the system they can actually work with partial failures (like network partitions or outtage of several different components) or handle &lt;a href=&quot;http://en.wikipedia.org/wiki/Byzantine_fault_tolerance&quot;&gt;byzantine failures&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Examples of these kinds of database systems are &lt;a href=&quot;http://www.percona.com/software/percona-xtradb-cluster&quot;&gt;Percona Xtradb Cluster&lt;/a&gt; (MySQL with distributed backend), &lt;a href=&quot;http://basho.com/riak/&quot;&gt;Riak&lt;/a&gt; (distributed database where you can make CAP-tradeoffs at the request level), &lt;a href=&quot;http://research.google.com/archive/spanner.html&quot;&gt;Google Spanner&lt;/a&gt;, &lt;a href=&quot;http://cassandra.apache.org/&quot;&gt;Cassandra&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;The database &lt;a href=&quot;http://www.datomic.com/&quot;&gt;Datomic&lt;/a&gt; is interesting in this regard. The design actually considers the state at the smallest possible level: The transaction level. You get Web-scale like Read-Scalability with stateless semantics on the read side and limited write scalability on the write side.&lt;/p&gt;




&lt;p&gt;Btw. this is not about SQL vs. NoSQL. There are distributed versions of both &quot;camps&quot; available - with different tradeoffs. I won&apos;t go into detail here.&lt;/p&gt;




&lt;p&gt;There are also block level and filesystem level systems available that are distributed form the ground up: &lt;a href=&quot;http://ceph.com/&quot;&gt;Ceph&lt;/a&gt;, &lt;a href=&quot;http://xtreemfs.org&quot;&gt;Xtreemfs&lt;/a&gt;, &lt;a href=&quot;http://www.gluster.org/&quot;&gt;Glusterfs&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;I&apos;ll cover some of the tradeoffs later, when we talk about &quot;layers of the cloud cake&quot;.&lt;/p&gt;




&lt;p&gt;So these two approaches are fundamentally different. While the &quot;let&apos;s accept distributed systems as a fact&quot; approach is harder, because you actually have to make tradeoffs. The &quot;classic&quot; approach tries to hide the &quot;distributedness&quot; of the system and abstract it away. This actually does work - at a certain scale for certain types of systems. Even up to pretty large ones - if you put in enough effort and money.&lt;/p&gt;




&lt;h2&gt;Definition of availability / reliability?&lt;/h2&gt;




&lt;p&gt;As we move to the whole picture view of things, let&apos;s think about what availability and reliability actually mean. In the and the end user cares about the overall availability of &quot;the system&quot;. No user actually cares about MySQL or some Webserver. They care about the service they are using.&lt;/p&gt;




&lt;p&gt;Also availability is not only about hardware either: It&apos;s also about software failures (See Joe Amstrong&apos;s Thesis: &lt;a href=&quot;http://www.erlang.org/download/armstrong_thesis_2003.pdf&quot;&gt;&quot;Making reliable distributed systems in the presence of software errors&quot;&lt;/a&gt; ). HA systems that need to go down for &quot;maintenance&quot;/software updates or fixes are a kind of a joke. They are &quot;highly available, as long as you exclude things that would bring down availability like updates&quot;.&lt;/p&gt;




&lt;p&gt;Another thing to consider is the definition of availability at the different service layers: Is a service that is 2 seconds not available, still available? Is it ok, if i just don&apos;t loose a request? Or is 0.5 seconds ok, but I might drop requests.&lt;/p&gt;




&lt;p&gt;If you think about it, you really have to do MTBF/MTTR considerations at the request/transaction level. &quot;Is it ok, if I drop a request if no answer is there for 3ms? Try another endpoint then. If I get an answer within another 2ms, I&apos;m fine =&gt; available&quot; - or &quot;I do a &apos;stat&apos; system call and it&apos;s ok, to wait 2 minutes, but do not ever let that call return with a failure&quot;.&lt;/p&gt;




&lt;p&gt;I&apos;ll get back to that picture in the &quot;layer&quot; discussion. But one spoiler: Most of the really highly reliable services actually solve these kinds of problems very, very high in the stack and do not care that much about reliability of the lower layers...&lt;/p&gt;




&lt;h2&gt;We have all the ingredients, so let&apos;s build that reliable cloud!&lt;/h2&gt;




&lt;p&gt;What does this whole HA/resiliency thing have to do with a reliable (or not) cloud?&lt;/p&gt;




&lt;p&gt;With the text so far you could come to the following conclusion:&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;A cloud infrastructure is a distributed system, it has some stateless and stateful components (I did not explicitly mention that. But just look at &lt;a href=&quot;http://docs.openstack.org/folsom/openstack-compute/admin/content/figures/openstack-logical-arch-folsom.jpg&quot;&gt;this picture&lt;/a&gt;. If that&apos;s not distributed… I don&apos;t know what is ;-) )&lt;/li&gt;
&lt;li&gt;We use the stateless approach for stateless parts (as shown in the Cloudscaling presentation)&lt;/li&gt;
&lt;li&gt;We throw in some distributed data store for the stateful parts (You could use the &quot;classic&quot; approach for that but why bother, if there are options like distributed MySQL servers available and arguably better)&lt;/li&gt;
&lt;li&gt;While were at it, put everything stateful on the VM side (e.g. base images) on a distributed datastore like Ceph, Xtreemfs or Glusterfs&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;We have all the ingredients, so let&apos;s build a reliable cloud, already. It cannot be that hard!&lt;/p&gt;




&lt;p&gt;Well, if we look at the current approaches that are out there of the big cloud providers (Amazon, Google, Microsoft, HP Cloud, etc.), we can see that they follow this model only up to a point. And we can see, that this &quot;distributed stateful part&quot; on the backend side (in AWS terms: &lt;a href=&quot;http://aws.amazon.com/ebs/&quot;&gt;EBS&lt;/a&gt;) is one of the main causes for outages...&lt;/p&gt;




&lt;p&gt;So the idea of &quot;let&apos;s just make that part more reliable - and get rid of this insane availability zone business while were at it to use just ONE big reliable backend&quot; somehow seems to be wrong.&lt;/p&gt;




&lt;p&gt;I&apos;ll show you why this approach won&apos;t make sense (business and availability wise) in the next part. Feel free to comment, question and fight my thoughts and ideas. They can only get better by attacking them!&lt;/p&gt;




&lt;p&gt;&lt;a href=&quot;http://blog.hendrikvolkmer.de/2013/04/09/there-will-be-no-reliable-cloud-part-2/&quot;&gt;Read part 2&lt;/a&gt;&lt;/p&gt;



</content><updated>2013-04-03T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2013/04/03/there-will-be-no-reliable-cloud-part-1/' rel='alternate'/></entry><entry><id>/2012/12/01/deploying_a_multi_node_setup_of_openStack_folsom_on_ubuntu_12.04.1_lts</id><title type='text'>Deploying a multi node setup of OpenStack Folsom on Ubuntu 12.04.1 LTS with one command</title><content type='html'>&lt;p&gt;We&apos;re going to cook some OpenStack today, so get your ingredients ready:&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.vagrantup.com&quot;&gt;vagrant&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://git-scm.com/&quot;&gt;git&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.opscode.com/hosted-chef/&quot;&gt;A Hosted Chef Account&lt;/a&gt; (You can use your own chef server, of course. Installing a chef server e.g. via &lt;a href=&quot;http://fnichol.github.com/knife-server/&quot;&gt;knife-server&lt;/a&gt; is out of scope of this tutorial, though)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/applicationsonline/librarian&quot;&gt;librarian-chef&lt;/a&gt; - &lt;code&gt;gem install librarian&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://wiki.opscode.com/display/chef/Spiceweasel&quot;&gt;spiceweasel&lt;/a&gt; - &lt;code&gt;gem install spiceweasel&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;To be honest, if you count the environment set up commands it&apos;s not one command but about five for the whole process. But after the environment set up you can repeat the process using the one setup command over and over again. This is pretty useful for environment specific testing, CI, etc.&lt;/p&gt;




&lt;blockquote&gt;&lt;p&gt;Instead of using vagrant with VirtualBox you can of course use your favourite virtualisation solution for testing or use a bare metal setup. All you need is at least two running machines with Ubuntu 12.04 on them. Instead of &lt;code&gt;vagrant up&lt;/code&gt; you would then use &lt;code&gt;knife bootstrap&lt;/code&gt; with the &lt;code&gt;single-controller&lt;/code&gt; and &lt;code&gt;single-compute role&lt;/code&gt;.&lt;/p&gt;&lt;/blockquote&gt;




&lt;p&gt;So let&apos;s cook:&lt;/p&gt;




&lt;h2&gt;Checkout the Chef cookbooks and Vagrantfile&lt;/h2&gt;




&lt;pre&gt;&lt;code&gt;git clone https://www.github.com/cloudbau/openstack-chef-repo.git
librarian-chef update
&lt;/code&gt;&lt;/pre&gt;




&lt;h2&gt;Set up Chef server environment&lt;/h2&gt;




&lt;p&gt;Using hosted chef is the easiest way to get started, but you can of course use your own Chef server. The Vagrantfile uses 2GB per  node at the moment. So be careful not to exceed your RAM if you increase the compute node count.&lt;/p&gt;




&lt;pre&gt;&lt;code&gt;vi config.rb # Change the Chef server settings
&lt;/code&gt;&lt;/pre&gt;




&lt;h2&gt;Upload cookbooks to chef server&lt;/h2&gt;




&lt;pre&gt;&lt;code&gt;spiceweasel infrastructure.yml | sh
&lt;/code&gt;&lt;/pre&gt;




&lt;h2&gt;Deploy&lt;/h2&gt;




&lt;p&gt;Now deploy Openstack!&lt;/p&gt;




&lt;pre&gt;&lt;code&gt;vagrant up
&lt;/code&gt;&lt;/pre&gt;




&lt;p&gt;Get a coffee, tea or whatever you like while it&apos;s cooking…&lt;/p&gt;




&lt;h2&gt;Use it&lt;/h2&gt;




&lt;p&gt;Open a browser window at &lt;code&gt;http://10.0.112.10&lt;/code&gt; to log in to your OpenStack dashboard. The default username and password is &quot;admin&quot; and &quot;secrete&quot;. If you want to change that, have a look at the attributes of the keystone cookbook.&lt;/p&gt;




&lt;h2&gt;What&apos;s next&lt;/h2&gt;




&lt;p&gt;The &lt;a href=&quot;http://www.opscode.com/solutions/chef-openstack/&quot;&gt;community chef cookbooks&lt;/a&gt; are still under development and the version used here is a slightly modified so that it works with OpenStack Folsom. These changes will be merged back (pull requests are pending) to the community cookbooks and the cookbooks will certainly evolve and also cover OpenStack services like &lt;a href=&quot;http://wiki.openstack.org/Cinder&quot;&gt;cinder&lt;/a&gt; and &lt;a href=&quot;http://wiki.openstack.org/Quantum&quot;&gt;quantum&lt;/a&gt;.&lt;/p&gt;



</content><updated>2012-12-01T00:00:00+01:00</updated><category term='OpenStack'/><link href='/2012/12/01/deploying_a_multi_node_setup_of_openStack_folsom_on_ubuntu_12.04.1_lts/' rel='alternate'/></entry><entry><id>/2012/11/06/status-of-the-smartos-openstack-port</id><title type='text'>Status of the SmartOS OpenStack Port</title><content type='html'>&lt;p&gt;Since the first blog post about the SmartOS port and after my lightning talk at the Grizzly Summit, several people have asked me about the status of the port. So here&apos;s an update.&lt;/p&gt;




&lt;h2&gt;Current status&lt;/h2&gt;




&lt;p&gt;First announcement: It is not production ready. That&apos;s not because the code is bad or unstable. It is just incomplete at the moment. The whole network management code is still missing.&lt;/p&gt;




&lt;p&gt;The whole thing is not a part of core OpenStack. I&apos;m not sure if it ever will be, but that&apos;s not really a problem. The code should be easily pluggable to the current OpenStack code base. If the code is good enough and sufficiently complete I do not see why it should not be part of the core project. With my current understanding of the code organisation this would mean the code would be merged to the nova code tree. If - for some reason - it will not be accepted, it should be easily be distributable as a separate python egg.&lt;/p&gt;




&lt;h2&gt;Future plans&lt;/h2&gt;




&lt;p&gt;Several people asked me why I&apos;m doing this. Who is behind this etc. Well, it&apos;s currently just a hobby project out of curiosity.  I&apos;ve been following the Solaris/OpenSolaris/Illumos/ development for some time because I think they have some pretty cool and different solutions to problems (ZFS, dtrace, etc.).&lt;/p&gt;




&lt;p&gt;Besides my hobby projects I do have work to do, so the development progress on this project will be not predictable. If you are interested in the further development, just watch this blog or the wiki page mentioned below.&lt;/p&gt;




&lt;h2&gt;More information&lt;/h2&gt;




&lt;p&gt;Andy Edmonds one of the original creators of the &lt;a href=&quot;https://blueprints.launchpad.net/nova/+spec/smartos-support&quot;&gt;Blueprint&lt;/a&gt; created a &lt;a href=&quot;http://wiki.openstack.org/smartos&quot;&gt;SmartOS wiki page at the OpenStack wiki&lt;/a&gt;. I added some info to it - including the current code base and information about how to set up a SmartOS/OpenStack installation.&lt;/p&gt;



</content><updated>2012-11-06T00:00:00+01:00</updated><category term='OpenStack'/><link href='/2012/11/06/status-of-the-smartos-openstack-port/' rel='alternate'/></entry><entry><id>/2012/10/25/the-fear-of-openstack-fragmentation-and-the-holy-grail-of-cloud</id><title type='text'>The fear of OpenStack fragmentation and the holy grail of cloud</title><content type='html'>&lt;p&gt;Last week I was at the OpenStack Grizzly Summit and one thing was clear from the beginning: OpenStack generates a lot of interest in all kinds of businesses: Hosting, Enterprise IT, Development shops. All kinds of people want to use cloud infrastructure in one way or another to get the benefits of a more flexible infrastructure.&lt;/p&gt;




&lt;p&gt;At the Folsom Summit in April a lot of talk was about &quot;What is OpenStack?&quot;. A Distribution? A cloud operating system? A Framework. It seems to me that the consesus now is &quot;its a toolkit to build cloud infrastructures&quot;.&lt;/p&gt;




&lt;p&gt;I agree. This is fairly accurate and reflects the way most people use it. There are several cloud products out there that use OpenStack in different ways to solve their problems. It also shows that it&apos;s not an off the shelf product that you just buy right now.&lt;/p&gt;




&lt;p&gt;When people say &quot;OpenStack is not mature&quot;, it&apos;s like saying &quot;The iPhone SDK is not mature&quot;. Well, it is not a product. So this assessment basically does not make sense. It is something that you use to &lt;em&gt;create&lt;/em&gt; a product.&lt;/p&gt;




&lt;p&gt;At this point it is unclear in what direction OpenStack will evolve. Certainly people use it to create products or services with it. And I can see that this creates a fear of fragmentation. A fear that was prominently put out by Gartner and rebutted by several OpenStack thought leaders.&lt;/p&gt;




&lt;p&gt;Saying that this will not happen is not enough, though. It will happen if we do not actively fight it. That&apos;s life. This is how the world works. Things decay. Entropy.&lt;/p&gt;




&lt;p&gt;Why is this important anyway? OpenStack is and will stay OpenSource it will be developed further, it will get better.&lt;/p&gt;




&lt;p&gt;It might get better. Without a clear definition or goal of what OpenStack will provide and what it will not provide, it might end up being what it is now: a toolkit. Everybody picks their tool from it and creates a frankencloud.&lt;/p&gt;




&lt;p&gt;Why is that bad? It solves your IT problem, right? Indeed it does for you, right now. And then, after the successful internal OpenStack deployment project your CIO proclaims &quot;We&apos;re going hybrid!&quot; And you realise that just by using OpenStack internally does not make it compatible with the public Cloud offerings that are out there that use OpenStack as well.&lt;/p&gt;




&lt;p&gt;And this is not an API issue. Just because OpenStack has EC2 APIs does not make OpenStack compatible with EC2. There is more needed to compatibility than just an API. The API is necessary but not sufficient.&lt;/p&gt;




&lt;p&gt;This is the holy grail of the cloud: Cloud interoperability.&lt;/p&gt;




&lt;p&gt;I think it can be done. But it has to be actively developed. This kind of stuff is hard and it does not &quot;happen&quot;. At the analyst panel at the grizzly summit Steve O&apos;Grady of Red Monk shared this view. He compared it with Java/J2EE: You can write once and run anywhere with Java - up to a point.&lt;/p&gt;




&lt;p&gt;&quot;100% no modifications needed&quot;-portability between cloud service providers internal/external clouds will not be possible in every case. But: It &lt;em&gt;is&lt;/em&gt; possible up to a certain point and this point should be the goal.&lt;/p&gt;




&lt;p&gt;How can we achieve this? OpenStack needs an executable Test Suite that everybody can run. Similar to the &lt;a href=&quot;http://en.wikipedia.org/wiki/Technology_Compatibility_Kit&quot;&gt;Java TCK&lt;/a&gt; just without the licensing issues. The OpenStack Foundation should make this kind of thing officially available. From a technical standpoint there already is code that could be a starting point (&lt;a href=&quot;https://github.com/openstack/tempest&quot;&gt;Tempest Project&lt;/a&gt;).&lt;/p&gt;




&lt;p&gt;It should be clear that if your internal cloud deployment passes this test suite and your service provider does also pass it, you should have no problems to deploy your application on the public provider.&lt;/p&gt;




&lt;p&gt;&lt;a href=&quot;http://www.rackspace.com/blog/rackspace-private-cloud-certification-program-combines-product-innovation-and-enterprise-stability/&quot;&gt;Rackspace already is starting to create something like this&lt;/a&gt;. While it might make business sense for them to do so, I think this approach is wrong. This kind of offering has to come from the OpenStack foundation! Nobody (except Rackspace) wants a &quot;Rackspace compatible&quot; cloud.&lt;/p&gt;




&lt;p&gt;Troy Toman from Rackspace said: &quot;We have a core that we know is the right thing. So how do we continue to innovate?&quot;. By putting the certification process into the OpenStack Foundation instead of Rackspace as a company.&lt;/p&gt;




&lt;p&gt;For true cloud interop we need a vendor and service provider independent entity - the OpenStack Foundation - that defines what is needed for your cloud service or private cloud product to call it &quot;OpenStack compatible&quot;.&lt;/p&gt;




&lt;p&gt;Everybody goes ahead and creates their own little or large OpenStack cloud and solves problems. This is fine but if we really want to make the best use of the possibilities of cloud computing with OpenStack in the next years a lot of work has to be done to make cloud interop happen.&lt;/p&gt;




&lt;p&gt;This work can either happen at every IT shop that wants to deploy cloud services in a hybrid model - or - at a central point that already manages OpenStack related things: the OpenStack Foundation.&lt;/p&gt;




&lt;p&gt;The latter would be much more effective and would end up costing everybody less. It also would be much more in alignment with how the cloud model works than the &quot;everybody does their own&quot; model.&lt;/p&gt;




&lt;p&gt;There was a comment at the summit that &lt;a href=&quot;http://www.networkworld.com/news/2012/101712-openstack-amazon-263461.html&quot;&gt;OpenStack can tell what Amazon has to do&lt;/a&gt; and at this point it was certainly &lt;a href=&quot;https://twitter.com/justinsheehy/status/259015836495409152&quot;&gt;arrogant&lt;/a&gt; to says so. With a cloud behaviour defining test suite and a strong statement from the OpenStack Foundation that proclaims that &quot;This is how clouds behave&quot; Amazon&apos;s cloud might just behave the same way. And then this is not an arrogant statement anymore but just reality.&lt;/p&gt;




&lt;p&gt;Cloud interoperability is essential to the success of OpenStack in my opinion. This is certainly not the last post on that topic.&lt;/p&gt;



</content><updated>2012-10-25T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/10/25/the-fear-of-openstack-fragmentation-and-the-holy-grail-of-cloud/' rel='alternate'/></entry><entry><id>/2012/10/05/nuking-a-big-cluster-with-pxe-boot-dban-and-crowbar</id><title type='text'>Nuking a big cluster with PXE boot, DBAN and Crowbar</title><content type='html'>&lt;p&gt;When you are deploying clusters on real hardware - again and again - to test the deployment - it is quite helpful to have the hardware in a clean state. Unfortunately there is no real &quot;Factory reset&quot; button on hard drives.&lt;/p&gt;




&lt;p&gt;However, there is an simple solution to this problem: When you install your cluster using PXE boot (like when using Crowbar),  you can easily wipe all the hard drives of the whole cluster using this config.&lt;/p&gt;




&lt;blockquote&gt;&lt;p&gt;A word of warning at this point: The following steps describe how to DELETE data from your whole data centre (if you are not careful). So do backups, use with care, etc. You have been warned.&lt;/p&gt;&lt;/blockquote&gt;




&lt;h2&gt;Step 1: Getting DBAN&lt;/h2&gt;




&lt;p&gt;&lt;a href=&quot;http://sourceforge.net/projects/dban/files/dban/&quot;&gt;DBAN&lt;/a&gt; is a small custom linux boot image that has only one purpose: Delete all disks.&lt;/p&gt;




&lt;p&gt;Download the ISO and extract the &lt;code&gt;DBAN.BZI&lt;/code&gt; file.&lt;/p&gt;




&lt;h2&gt;Step 2: Set up PXE boot&lt;/h2&gt;




&lt;p&gt;We are using this setup out of our Crowbar installation, so the PXE environment is already setup. If you use it stand alone, with Cobbler or something else, adjust the paths accordingly.&lt;/p&gt;




&lt;p&gt;In the Crowbar case, create a file &lt;code&gt;/tftpboot/discovery/pxelinux.cfg/nuke&lt;/code&gt; with the following content:&lt;/p&gt;




&lt;pre&gt;&lt;code&gt;DEFAULT nuke
PROMPT 0
TIMEOUT 10
LABEL nuke
  KERNEL DBAN.BZI
  append nuke=&quot;dwipe --autonuke --method zero&quot; silent vga=785
  IPAPPEND 2
&lt;/code&gt;&lt;/pre&gt;




&lt;p&gt;Copy the &lt;code&gt;DBAN.BZI&lt;/code&gt; file to &lt;code&gt;/tftpboot/discovery/&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;Step 3: Prepare the nuke&lt;/h2&gt;




&lt;p&gt;Since we are going to nuke anything anyway and do not want chef to interfere with our evil plans, we stop chef-client on the admin node:&lt;/p&gt;




&lt;pre&gt;&lt;code&gt;bluepill chef-client stop
&lt;/code&gt;&lt;/pre&gt;




&lt;p&gt;Now we enable the self destruct button:&lt;/p&gt;




&lt;pre&gt;&lt;code&gt;cd /tftpboot/discovery/pxelinux.cfg/
ln -fs nuke default
&lt;/code&gt;&lt;/pre&gt;




&lt;h2&gt;Step 4: Nuke the cluster&lt;/h2&gt;




&lt;p&gt;To nuke the entire cluster (excluding the admin), just delete all nodes from crowbar - either by clicking through the UI or by using the crowbar CLI tool.&lt;/p&gt;




&lt;p&gt;Then reboot the nodes via IPMI and wait for the data destruction to commence.&lt;/p&gt;




&lt;p&gt;Nuking the admin node can be done using the DBAN iso via some virtual IPMI drive.&lt;/p&gt;




&lt;h2&gt;Summary&lt;/h2&gt;




&lt;p&gt;You can easily reset a lot of machines using PXE boot and DBAN. When you are using Crowbar the setup shown here makes it easy to start completely fresh any time.&lt;/p&gt;




&lt;p&gt;Crowbar &quot;cleans&quot; the hard disks itself when you trigger an install, but some times this does not suffice. Some LVM parts remain intact and the following installation does not work properly. Using the DBAN nuke we can ensure that we really do start with empty hard drives on every node.&lt;/p&gt;



</content><updated>2012-10-05T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/10/05/nuking-a-big-cluster-with-pxe-boot-dban-and-crowbar/' rel='alternate'/></entry><entry><id>/2012/09/26/crowbar-at-scale</id><title type='text'>Deploying OpenStack with Crowbar at scale</title><content type='html'>&lt;p&gt;I&apos;ve been working with &lt;a href=&quot;http://www.twitter.com/ehaselwanter&quot;&gt;@ehaselwanter&lt;/a&gt; on an OpenStack deployment with Crowbar for a project we are currently doing for &lt;a href=&quot;http://www.laboratories.telekom.com&quot;&gt;T-Labs&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&quot;Scale&quot; means different things to different people. So let us be specific here: We deployed 80 nova compute nodes and 18 Swift nodes, so that adds up to close to 100 nodes with Crowbar.&lt;/p&gt;




&lt;p&gt;The installation from empty, humming bare metal to a running OpenStack cluster takes about 1 to 2 hours. Of course the development effort to make this work took a bit longer.&lt;/p&gt;




&lt;p&gt;I want to share the things we learned because either nobody has done this at this scale before or if someone did they did not talk about it and/or did not share the code changes. And code changes are necessary to get it working.&lt;/p&gt;




&lt;p&gt;The naive approach to this kind of deployment is like this: You say &quot;What works with 5 VMs or 10 physical servers surely works with 100 servers as well&quot;. Of course, it does not.&lt;/p&gt;




&lt;p&gt;We identified the following problems:&lt;/p&gt;




&lt;h2&gt;Coordination problems&lt;/h2&gt;




&lt;p&gt;Crowbar uses chef in a very specific way. The Chef run on the Crowbar admin node is basically part of the Crowbar application. It takes place in the crowbar internal installation state transitioning process. This means it creates DHCP and DNS configuration in order to serve the right PXE boot configs etc.&lt;/p&gt;




&lt;p&gt;The problem that arises at about 50 nodes (in our case) is this:
- A node is in state &quot;discovered&quot; and is triggered to be &quot;allocated&quot;
- This triggers a chef-client run on the to-be-installed-node as well as the admin server
- The chef-client run on admin takes longer than the client-run + reboot on the to-be-installed-node which then boots again into discovery mode.&lt;/p&gt;




&lt;p&gt;This was the basic high-level problem that we identified. I guess internally all kinds of strange things happened. The symptoms where like this:&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/26/crowbar-failure1.png&quot; height=&quot;469px&quot; width=&quot;710px&quot; alt=&quot;Boot problem 1&quot; /&gt;&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/26/crowbar-failure2.png&quot; height=&quot;469px&quot; width=&quot;710px&quot; alt=&quot;Boot problem 2&quot; /&gt;&lt;/p&gt;




&lt;p&gt;We had the following options:&lt;/p&gt;




&lt;p&gt;a) Real solution: Coordinate state between admin node and installed node (e.g. reboot node when chef-client run on admin has succeeded and set up the installation correctly)&lt;/p&gt;




&lt;p&gt;b) Hacky, time pressured, get-it-done solution: Make chef-client run on admin fast enough (which is the current behaviour case for smaller setups).&lt;/p&gt;




&lt;p&gt;After we wiped our engineering tears out of our eyes when we realised there was no time for a) we went with b) and reduced the chef-client run time on the admin node from about 3-5 minutes to about 50 seconds. To find slow chef cookbooks, we used this &lt;a href=&quot;https://gist.github.com/3712301&quot;&gt;simple script&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The fixes where quite simple: Remove unnecessary searches, make sure that e.g. DHCP, DNS and PXE boot configs only get changed, when actual changes happen. This was mostly done by enforcing order in hashes/arrays that were used to generate templates. The DNS part was a bit tricky because the &quot;zone serial number&quot; should only change when something else changes and it should not trigger changes itself.&lt;/p&gt;




&lt;h2&gt;Networking setup&lt;/h2&gt;




&lt;p&gt;The way the nova network config was set up, clearly was created with /24 networks in mind. With 256 hosts it does not matter if you change something for every IP in that range. With a /16 network and over 65k IPs it does matter.&lt;/p&gt;




&lt;p&gt;The fixes here were straight forward and we actually were surprised that there were not more problems with the IP range and network configuration.&lt;/p&gt;




&lt;h2&gt;Bonding&lt;/h2&gt;




&lt;p&gt;… or how I stopped worrying about udev and love the crowbar approach. According to the documentation (as I understand it) udev is supposed to name the ethernet devices according to the MAC address. This should result in an ordered naming scheme. However, this is not what we observed.&lt;/p&gt;




&lt;p&gt;With crowbar you can define how network cards will be addressed in the &lt;code&gt;bc-network-template.json&lt;/code&gt; file. We use this approach to create two bonding interfaces (one for 1G and one for 10G networks). The ethX naming is not consistent across nodes, however the two 1G and two 10G interfaces are bonded and assigned to the right interface. The bonding interfaces have to be consistent across nodes (We changed crowbar to make it that way) because OpenStack refers to several network interfaces not only in the config files (which would be fine) but also in the database. With the database as central storage the naming has to be consistent so that OpenStack can find the right interfaces to do its work on.&lt;/p&gt;




&lt;p&gt;The whole bonding setup was quite hard to get working. The reason for this is the way Ubuntu (and I guess Linux networking in general) uses the networking config files: As a way to provide command line options for several tools. This means that transitioning from one setup to another requires steps like this:&lt;/p&gt;




&lt;ol&gt;
&lt;li&gt;Tear down interfaces based on current settings&lt;/li&gt;
&lt;li&gt;change config&lt;/li&gt;
&lt;li&gt;create new networking config from files&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;There is already a lot of code in Crowbar to orchestrate transitions from one setup to another but we had to extend some parts - and we wanted to keep the networking management parts of the distribution.&lt;/p&gt;




&lt;h2&gt;Development workflow&lt;/h2&gt;




&lt;p&gt;We do not use the Crowbar &lt;code&gt;dev&lt;/code&gt;-tool. It does not do a lot more (that we need) that git does not do already but has a lot of assumptions about how you have to manage our code. So we just use git with &lt;a href=&quot;https://github.com/nvie/gitflow&quot;&gt;git flow&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;We think that git submodules are a pain to use and should be avoided in the future. Different versions of dependencies (barclamps, chef cookbooks, packages etc.) should be managed with tools like &lt;a href=&quot;http://gembundler.com/&quot;&gt;bundler&lt;/a&gt;, &lt;a href=&quot;http://berkshelf.com/&quot;&gt;berkshelf&lt;/a&gt; or &lt;a href=&quot;https://github.com/applicationsonline/librarian&quot;&gt;librarian&lt;/a&gt;. We&apos;re working on a setup that uses these tools in the Crowbar context and we are glad to see that submodules already seem to have vanished from the current development branch.&lt;/p&gt;




&lt;p&gt;For the initial setup it is fine to use the ISO installation approach. To change things on the fly (and or make Crowbar work with actual 100 nodes) we needed a way to change things quicker. Crowbar uses Chef and Chef can do that, so we just configure &lt;code&gt;knife&lt;/code&gt; to point to the crowbar server and to include all cookbook directories from the barclamp source tree and we are good to go. A simple &lt;code&gt;knife cookbook upload nova&lt;/code&gt; and the latest OpenStack config changes can be applied.&lt;/p&gt;




&lt;h2&gt;Screenshots&lt;/h2&gt;




&lt;p&gt;Everybody loves screenshots:&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/26/crowbar-dashboard.png&quot; height=&quot;909px&quot; width=&quot;710px&quot; alt=&quot;Crowbar dashboard&quot; /&gt;&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/26/ganglia-overview.png&quot; height=&quot;577px&quot; width=&quot;710px&quot; alt=&quot;Crowbar dashboard&quot; /&gt;&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/26/ganglia-overview2.png&quot; height=&quot;577px&quot; width=&quot;710px&quot; alt=&quot;Crowbar dashboard&quot; /&gt;&lt;/p&gt;




&lt;h2&gt;Next steps&lt;/h2&gt;




&lt;p&gt;We&apos;ve just finished the code for internal use. It is not public yet. We will start to integrate changes to the Crowbar open source repositories in the next days.&lt;/p&gt;




&lt;p&gt;We would love to hear or read other stories from deployments at this scale. You can reach us via Twitter &lt;a href=&quot;http://www.twitter.com/hvolkmer&quot;&gt;@hvolkmer&lt;/a&gt; and &lt;a href=&quot;http://www.twitter.com/ehaselwanter&quot;&gt;@ehaselwanter&lt;/a&gt;&lt;/p&gt;



</content><updated>2012-09-26T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/09/26/crowbar-at-scale/' rel='alternate'/></entry><entry><id>/2012/09/14/reducing-development-production-parity-for-openstack-with-smartos-zones</id><title type='text'>Reducing development-production parity for OpenStack development with SmartOS zones</title><content type='html'>&lt;p&gt;Reducing &lt;a href=&quot;http://www.12factor.net/dev-prod-parity&quot;&gt;Dev/prod parity&lt;/a&gt; has several advantages that are listed and explained at the linked page. Without a useful reduction of the parity &lt;a href=&quot;http://continuousdelivery.com/&quot;&gt;Continuous delivery&lt;/a&gt; is impossible or at least akin to madness.&lt;/p&gt;




&lt;p&gt;In context of OpenStack development and deployment dev/prod parity seems to be not one of the most addresses problems right now. From what I hear the de facto standard development environment for OpenStack is &lt;a href=&quot;http://www.devstack.org&quot;&gt;devstack&lt;/a&gt;. Devstack is a perfect fit to get OpenStack running and to start developing but it is not the way people deploy OpenStack in production. This leads to all kinds of problems that could be avoided.&lt;/p&gt;




&lt;p&gt;Reducing dev/prod parity is very important and that&apos;s why I think of it at this point even though a SmartOS based OpenStack production deployment seems to be a thing of the future right now.&lt;/p&gt;




&lt;h2&gt;Development setup&lt;/h2&gt;




&lt;p&gt;It turns out that SmartOS is fantastic to address this problem. Let&apos;s see how the current development setup looks like:&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/14/single-host.png&quot; alt=&quot;Single Host Setup&quot; /&gt;&lt;/p&gt;




&lt;p&gt;I basically start up &lt;code&gt;nova-compute&lt;/code&gt; in the global zone to have access to &lt;code&gt;vmadm&lt;/code&gt; and start all the other services in separate zones. At first this seems to be a hassle and to much for development. But it forces you to think of all the things that will go wrong in production. For example: &lt;code&gt;nova-compute&lt;/code&gt; needs to be able to access the database (which will be fixed in the future). So I need to setup the mysql credentials in a way to support that.&lt;/p&gt;




&lt;h2&gt;Multi node setup&lt;/h2&gt;




&lt;p&gt;How would this model look like in a more production like environment?&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;/assets/2012/09/14/multi-host.png&quot; alt=&quot;Multi Host setup&quot; /&gt;&lt;/p&gt;




&lt;p&gt;You can see that the basic service separation is the same. From an service standpoint it does not matter if the zones were deployed on a development host or on several physical hosts in the multi node production case.&lt;/p&gt;




&lt;p&gt;Being able to develop OpenStack in a production like setup will reduce the likelihood of surprises when it comes to deployment. SmartOS zones help a lot to achieve this goal.&lt;/p&gt;




&lt;h2&gt;Do you want to know more?&lt;/h2&gt;




&lt;p&gt;If you want to know more about SmartOS and OpenStack, &lt;a href=&quot;https://www.openstack.org/summit/san-diego-2012/vote-for-speakers/&quot;&gt;vote for my talk proposal &quot;Porting OpenStack to SmartOS&quot;&lt;/a&gt; at the upcoming OpenStack San Diego Summit.&lt;/p&gt;



</content><updated>2012-09-14T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/09/14/reducing-development-production-parity-for-openstack-with-smartos-zones/' rel='alternate'/></entry><entry><id>/2012/09/07/why-smartos-as-an-openstack-base-os</id><title type='text'>Why SmartOS as an OpenStack base operating system?</title><content type='html'>&lt;p&gt;When SmartOS was first announced about a year ago, I downloaded the ISO, booted it in VMware, logged in and then… nothing. What is this? It&apos;s small, it is not supposed to be installed on disk. What do I do with it? What is so special about it? It is just an Illumos distro - a small and strange one. I did not get it.&lt;/p&gt;




&lt;p&gt;I am now &lt;a href=&quot;http://blog.hendrikvolkmer.de/2012/08/31/porting-openstack-to-smartos/&quot;&gt;porting OpenStack to SmartOS&lt;/a&gt; and I think it is a perfect fit for that purpose. It is truly a Cloud OS. What does that mean? Let&apos;s go through the features of SmartOS that make it a perfect fit to be run as a Cloud Base OS.&lt;/p&gt;




&lt;h2&gt;ZFS&lt;/h2&gt;




&lt;p&gt;The &lt;a href=&quot;http://en.wikipedia.org/wiki/ZFS&quot;&gt;ZFS&lt;/a&gt; storage model is based on Copy-On-Write which means that snapshots and clones are essentially free. This is fantastic for a Cloud compute node. Say you have a small set of defined VM base images and then spin up 20 VMs of Ubuntu 12.04. The Ubuntu image takes up about 500 MB of disk space. How much do 20 VMs of Ubuntu 12.04 use? Simple math right? Well, with ZFS these 20 VMs take 500 and a few MB. Total. Of course as the instances diverge this ratio gets lower but it is still impressive and very useful. Also spinning up Instances means you only have to clone the base image in ZFS which will take a few seconds the most, compared to copying it on a traditional file system. Somewhere in the internets I can hear someone shouting: &quot;My SAN does the same thing since the 80ies, smartass&quot;. Sure, but it costs quite a bit more and you also get a lot of &lt;a href=&quot;http://radar.oreilly.com/2012/04/complexity-vs-simplicity.html&quot;&gt;complexity&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The local storage needs for the base operating system get even better when you use Zones (see below).&lt;/p&gt;




&lt;h2&gt;DTrace&lt;/h2&gt;




&lt;p&gt;While the open source SmartOS (to my knowledge) lacks the cool graphical reporting possibilities based on DTrace it still contains the real &lt;a href=&quot;http://dtrace.org/blogs/about/&quot;&gt;DTrace&lt;/a&gt; capabilities. Everyone who had to hunt a memory leak or strange process behaviour in production will know what it is worth to have a good tracing capability at hand. DTrace will enable you to find performance and other problems that in a stable and very low overhead way.&lt;/p&gt;




&lt;h2&gt;Zones&lt;/h2&gt;




&lt;p&gt;Zones let you securely partition your OS without the overhead of another hypervisor (&lt;a href=&quot;http://en.wikipedia.org/wiki/Operating_system-level_virtualization&quot;&gt;OS level virtualisation&lt;/a&gt;). In SmartOS a zone is always the outer &quot;hull&quot; of virtualisation. Either inside is just another SmartOS, or a qemu process with KVM that runs any other operating system. In combination with ZFS and Crossbow this is fantastic basis for virtual environments.&lt;/p&gt;




&lt;p&gt;As Zones have minimal overhead they provide an ideal foundation for PaaS services like databases, load balancers etc. In an OpenStack context this would allow us to just use a special VM image that translates to a zone for a DB. We could thus avoid the whole &lt;a href=&quot;https://lists.launchpad.net/openstack/msg15314.html&quot;&gt;RedDwarf/RedDwarf lite-back-and-forth-disaster&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;KVM&lt;/h2&gt;




&lt;p&gt;Using KVM allows us to run (almost) any x86 operating system using the Intel VT-D, VT-X extensions with very little overhead. SmartOS has integrated KVM in a way that is easy to use. Actually I&apos;m pretty glad that they did not use libvirt as abstraction layer but created their own.&lt;/p&gt;




&lt;h2&gt;SMF - Service Management Facility&lt;/h2&gt;




&lt;p&gt;With &lt;a href=&quot;http://en.wikipedia.org/wiki/Service_Management_Facility&quot;&gt;SMF&lt;/a&gt; you can define services, dependencies and let SMF manage them for you. I do believe that dependencies are important and a good way to model things for services (in contrast to what the  &lt;a href=&quot;http://upstart.ubuntu.com/cookbook/#id74&quot;&gt;Upstart&lt;/a&gt; designers think). Managing services in this model has been proving to work well e.g. with the &lt;a href=&quot;http://www.erlang.org/doc/design_principles/des_princ.html&quot;&gt;Erlang Supervision Tree&lt;/a&gt;. With SMF you can define dependencies and restart behaviours of services as well as alerts when stuff goes awry.&lt;/p&gt;




&lt;h2&gt;Crossbow&lt;/h2&gt;




&lt;p&gt;Illumos comes with its own network virtualisation layer (basically a virtual switch) called Crossbow. In SmartOS the network virtualisation is integrated in the &lt;code&gt;vmadm&lt;/code&gt; tool and works seamlessly. The virtualisation is based on VLANs: Each tenant will get their own VLAN. Crossbow was way ahead of the competition when it was released. With &lt;a href=&quot;http://openvswitch.org&quot;&gt;OpenVSwitch&lt;/a&gt; the Linux world has caught up and maybe even surpassed Crossbow. It will be interesting to see how the development of these technologies will continue.&lt;/p&gt;




&lt;h2&gt;SmartOS VM tools&lt;/h2&gt;




&lt;p&gt;&lt;code&gt;imgadm&lt;/code&gt;, &lt;code&gt;vmadm&lt;/code&gt; are tools to manage images and VMs. They are clearly written in a way to be used as part of a cloud platform. No config files with strange syntax or super long command line options. Instead these tools work with short and clear command line commands that take JSON files as options to do the real work. This is fantastic when used as an API from a cloud orchestration layer like OpenStack. And it feels a lot more like an API than libvirt (I save the libvirt &quot;API&quot; rant for another time... XML snippets as arguments. Really?).&lt;/p&gt;




&lt;h2&gt;A true cloud OS&lt;/h2&gt;




&lt;p&gt;The - at first look strange - model of usb key or PXE booting the base OS is a fantastic fit for cloud environments and makes a lot of problems go away. OS updates? Just reboot. How to install the OS? Don&apos;t. Just use a Ramdisk. Why waste space on disk when you can have a Ramdisk of 250 MB with the whole OS in it?&lt;/p&gt;




&lt;p&gt;As you can see I&apos;m pretty excited about SmartOS in this context. In combination with OpenStack this might be the best open source cloud orchestration stack that you can get.&lt;/p&gt;



</content><updated>2012-09-07T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/09/07/why-smartos-as-an-openstack-base-os/' rel='alternate'/></entry><entry><id>/2012/08/31/porting-openstack-to-smartos</id><title type='text'>Porting OpenStack to SmartOS</title><content type='html'>&lt;h2&gt;General idea&lt;/h2&gt;




&lt;p&gt;&lt;a href=&quot;http://smartos.org&quot;&gt;SmartOS&lt;/a&gt; is a modern operating system that was actually created to run cloud orchestration software. Joyent uses it in their commercial Smart Data centre software. So it makes perfect sense to port &lt;a href=&quot;http://openstack.org&quot;&gt;OpenStack&lt;/a&gt; to it.&lt;/p&gt;




&lt;p&gt;In fact, this idea is so obvious that there is a &lt;a href=&quot;https://blueprints.launchpad.net/nova/+spec/smartos-support&quot;&gt;blueprint&lt;/a&gt; describing this.&lt;/p&gt;




&lt;p&gt;Thijs Metsch wrote a there part (&lt;a href=&quot;http://www.nohuddleoffense.de/2012/02/12/smartstack-smartos-openstack-part-1/&quot;&gt;1&lt;/a&gt;,&lt;a href=&quot;http://www.nohuddleoffense.de/2012/02/15/smartstack-smartos-openstack-part-2/&quot;&gt;2&lt;/a&gt;,&lt;a href=&quot;http://www.nohuddleoffense.de/2012/02/28/smartstack-smartos-openstack-part-3/&quot;&gt;3&lt;/a&gt;) blog posts series about this endeavour. But it ends where it gets interesting: The part where you actually would start VMs, copy images, set up networking etc.&lt;/p&gt;




&lt;p&gt;I built on his work and started where he stopped.&lt;/p&gt;




&lt;h2&gt;The plan&lt;/h2&gt;




&lt;ol&gt;
&lt;li&gt;Create a nova Hypervisor backend for SmartOS based on the current folsom (master) branch of Openstack Nova.&lt;/li&gt;
&lt;li&gt;Integrate networking through Quantum&lt;/li&gt;
&lt;li&gt;???&lt;/li&gt;
&lt;li&gt;Profit&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;Current status&lt;/h2&gt;




&lt;p&gt;I can currently start and stop VMs (both SmartOS zones and KVM based VMs) through the OpenStack API. Glance integration is there (Images from glance will be put into ZFS the way &quot;imgadm&quot; wants them). On the SmartOS side I use the &lt;code&gt;vmadm&lt;/code&gt; and &lt;code&gt;imgadm&lt;/code&gt; tools as an integration API. Basic networking also works. Theres quite a lot of work to do to get security groups, floating IPs etc. working.&lt;/p&gt;




&lt;p&gt;Here are two screenshots that show how it currently looks:&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;http://blog.hendrikvolkmer.de/assets/2012/08/31/smartos1.png&quot; alt=&quot;Booting a VM&quot; /&gt;&lt;/p&gt;




&lt;p&gt;&lt;img src=&quot;http://blog.hendrikvolkmer.de/assets/2012/08/31/smartos2.png&quot; alt=&quot;Logging into a OpenStack started Zone&quot; /&gt;&lt;/p&gt;




&lt;p&gt;Expect more info to come in the next days and weeks.&lt;/p&gt;



</content><updated>2012-08-31T00:00:00+02:00</updated><category term='OpenStack'/><link href='/2012/08/31/porting-openstack-to-smartos/' rel='alternate'/></entry></feed>