pixel
Seite wählen

NETWAYS Blog

stackconf 2022 | How to Be a Good Corporate Citizen in Open Source

Dr. Dawn Foster is a unix sysadmin for VMware, did her doctorate on linux kernel development and has been following her tech career for over twenty years! Her main focus is community and open source work. In her talk she enlightened us about how to be a Good Corporate Citizen. If you would prefer to watch a recording head on over to YouTube to listen to her talk – or if you prefer to read all about it, go ahead and read on!

This is what I’ve learned from Dawn in her talk „How to Be a Good Corporate Citizen in Open Source„:

Collaboration in OSS Projects: Individuals, Companies, Communities

Intro Slide with Pictures of Dr Foster

Open source communities have a variety of different people involved.

A project has developers, a release team, localisation and translation teams, marketing, community managers, tech writers, users and lots of other people involved. All of these people are working together as one community towards the goal of a good project.

 

Balance

This community that works on the project is what makes the decisions on where the project goes, an outside corporate entity can not force them to adapt changes they don’t want – or that go against the direction of the project. As a company, you need to align your needs with the needs of the project. This is important to understand when making contributions, so you don’t put your employees in a position where they have to either do harm to the project or their employment.

Contribution Strategy and Plans

Aligning Goals

The first and most important step is to make sure that your companies and the project’s goal are in alignment. If this is the case, it will be so much easier to justify putting resources and effort into the project. It also makes it easier to make the team that works on the project understand the importance of their work.

Finding and focusing on projects is an important point. Look at your operations team and what tools they use – those might be a great fit to support. Are there development or deployment tools that are open source and you could support? These questions can help you figure out what to support, in order to make a better point to your superiors to help support those projects.

Communication

Make sure that all of your teams that work on open source projects communicate with each other, to avoid having conflicts in public open source projects. If your vision is aligned, you will have a lot fewer issues. You can even help organise meetups, provide discussion channels and events to further help foster productive discourse.

Which Projects?

Slide "Which Projects?"

After you know which projects to support, you need people to contribute. Maybe you already have people that have contributed in the past. Keep in mind that contributing to open source projects needs a different skill set than working on internal projects – they need to for example be comfortable receiving and reacting to feedback in public.

Staffing

You can also hire people that are already contributing to those projects – but you might need to be careful with that, because you do not want to get the reputation to aggressively poach contributors from projects. It requires a bit of nuance to make it known that you are hiring for a project, without coming across too strong.

Guidelines

Having guidelines and best practices ready for people to engage in open source projects. Try to find a good balance between providing help and guidance and not being too overbearing or scaring your employees away from making contributions. Help engineers understand what they want to do and why.

Measure Success

You also want to make sure you can measure outcomes and results. How do you pull that kind of data? It really depends on what you want to achieve – examples would be: for the goal „improving performance“ check the softwares performance data. For the goal „gain influence“ check your employees in meaningful positions in the project. You might also want to overmeasure a little bit, to have some extra data at hand, in case your focus shifts in the future.

Making Contributions as a Good Corporate Citizen in OSS

Slide "Getting Started"Before hopping into a new project, you might want to look around a little and understand how the community works and feel around a little. Look at the documentation, especially at the contribution docs and the code of conduct. All projects work differently and understanding how things work to not violate any community norms.

Start with small contributions and work your way up, instead of just working on a big addition to the project and just dumping them unannounced.

 Learn from Feedback

When you start participating in a project you need to expect feedback. Sometimes feedback will be kind, sometimes it will be worded a bit more harshly. What you need to do is stay focused on what changes you need to make on your contribution, stay kind and maybe have someone proofread what you write to catch any unwanted harshness in how you write your answers. Try not getting defensive and iterate on what you mean.

Work with the Community

You might want to connect with people that worked on similar areas that you are touching on, and collaborate. Get in touch with the people who run a project and discuss strategies with them, to offer better help and be more productive in the process!

Break up your work into smaller contributions to make it easier for the maintainers to work with you and to iterate through the process.

Remember that you have a lot less control over other people working on the project, unlike in a company where you are able to escalate issues to managers. Meet people where they are and be kind!

Relationships

Having good relationships with people you collaborate with makes it a lot more easy and fun to work together. Conferences and meetups are very important to solve issues when you can talk about something in person. Knowing the human being behind the other side of your screen can make a big difference! When you need to do something new, or have questions – having someone you know that can put you in the right direction is an incredibly valuable thing to have.

Upstream your Patches

When you maintain your patches internally, every time the project has an update there is a risk that someone will forget to apply them, or has to fix places that were touched by the upstream and the patch. If you get your patches in the upstream repository you will not run into those issues and you might help other people with them as well!

Maintenance Expectations

If you are adding larger features to a project’s codebase, make sure that you can help with its maintenance and have someone constantly assigned to that task. If you make additions to a project and then bail on it, you create a big workload for the maintainers, which will make you and your organisation look bad and future contributions to this or other projects will be a lot less well received.

Open Source Your Software

If you are open sourcing your projects, don’t just dump dead projects onto the internet and hope someone is going to take over. This is at best naive, and will also make your company look bad. Take care of your software, just the same way you would under a proprietary licence!
Maintaining a project with the community involved is a lot of work, but it pays off in the long run. Tend to your pull requests and issues and you will reap the hard work others have put into it.

If you have read through this all, you’ll be happy to hear that there is more content like this on this blog – or if you also enjoy a video about it, check out our YouTube channel with lots of recordings from our conferences!

Take a look at our conference website to learn more about stackconf, check out the archives and register for our newsletter to stay tuned!

Feu Mourek
Feu Mourek
Developer Advocate

Feu verbrachte seine Kindheit im schönen Steigerwald, bevor es sich aufmachte die Welt zu Erkunden. Seit September 2016 unterstützt es Icinga zunächst als Developer und seit 2020 als Developer Advocate, und NETWAYS als Git und GitLab Trainer. Seine Freizeit verbringt es hauptsächlich damit Video-, und Pen and Paper Rollenspiele zu spielen, sich Häuser zu designen (die es sich nie leisten können wird) oder ganz lässig mit seinem Cabrio durch die Gegend zu düsen.

stackconf 2022 | Spotify’s outage of 8.3.2022, explained

We’re still excited about stackconf 2022! Our Open Source Infrastructure Conference, which for the very first time took place in person in Berlin this year. We’ve had many awesome speakers on stage and one of their outstanding lectures I will present to you in the following.

Spotify had one of its most disruptive outages in recent history in the evening of 8.3.2022 Tue 19:00 CET, which resulted in over an hour of downtime and users getting logged out. Kat Liu, Senior Software Engineer at Spotify Berlin, explained the storm of this incident.

Kat was enjoying her day off because of International Women’s Day when she received the first alert. In a short time, she received many such alerts, and it became clear that there was a serious issue. Hundreds of people have posted online that they have been logged out and can no longer log in.

The Outage

As you can see in the screenshot above, there is a warning with the message: Failed to resolve name. The reason for this warning was that the internal system could not resolve the name of service2 because service2 was down, which caused the outage.

The Fix

The solution for this problem was very simple, just revert all services back to using the Nameless system. The outage was mostly restored by 19:40 CET.

But why were users logged out?

The screenshot above shows how service1 calls service2. Since Service2 was not available, an incorrect NOT_FOUND error was returned, causing the user to be logged out and unable to log back in.

This error was later changed to UNAVAILABLE.

The Aftermath

An outage lasting about 40 minutes resulted in about 50 million login sessions were disrupted.

Over the next few days/weeks, 3 million new duplicate accounts were created as many users were not regularly logging into Spotify and had forgotten their credentials.

That was just a short summary of Kat Liu’s talk at stackconf 2022. You can watch her full talk on our YouTube channel. Enjoy!
And don’t forget to register for the stackconf newsletter to stay tuned about the upcoming plans for next year’s stackconf! See you there!

Sukhwinder Dhillon
Sukhwinder Dhillon
Developer

Sukhwinder hat 2021 seine Ausbildung als Fachinformatiker für Anwendungsentwicklung bei NETWAYS erfolgreich abgeschlossen. In seiner Freizeit fährt er gerne Fahrrad, trifft sich mit Freunden, geht Joggen oder sitzt vorm Computer und lernt etwas Neues.

stackconf 2022 | DevOps or DevX – Lessons We Learned Shifting Left the Wrong Way

stackconf 2022 was a full success! On July 19 and 20, our conference took place in Berlin and we very much enjoyed the event, which has been on site for the very first time! stackconf was all about open source infrastructure solutions in the spectrum of continuous integration, container, hybrid and cloud technologies. We’re still excited about our expert speaker sessions. In the following you get a deeper insight into one of our talks.

We kicked off the lecture program with a talk by Hannah Foxwell on „DevOps or DevX – Lessons We Learned Shifting Left The Wrong Way“. Here is what I’ve learned from Hannah:

Once Upon A Time

DevOps is such a common term now that it has almost lost its accurate meaning. Once upon a time there were two teams, Devs and Ops, with different missions and goals – rapid development vs. stable user experience. Changes were handed over just like that and great effort was put into getting even the smallest features into production to the customer in a stable way. For sure: This needed to change!

While some people felt the problem was the Ops team. Here, NoOps was a thing. This misconception came from thinking that the Ops team didn’t care about users because the Ops team didn’t want to release the new features fast enough. As a result, more and more typical Ops tasks like backup, monitoring or cost management were outsourced to developers. At a certain point, these additional tasks became too much for the dev team, which some developers were also unhappy with.

Focus on Team Health

According to a report by Haystack Analytics, 83% of all developers suffer from burnout, mostly triggered by the demands of having to learn and consider more and more technologies and areas.
Here you have to pay more attention to HumanOps again to focus on the health of the team.

Just like the old ways of splitting everything into silos, the NoOps approach was the wrong way to go. Here, it’s important to use mixed teams with a product-owner mentality for the different layers. Each team is responsible for delivering the best possible experience for their users.

Hannah also touched on how important the proper site reliabitily is and how it can impact the team. With a 99% reliability over 28 days, you have 400 minutes, enough time for manual intervention. The larger the reliability, the less time and more stress the team has until only automatic interventions are possible to stay within the time. Here, no human can react fast enough.

On Site Realiability

But you also have to see if this is needed by the user. Many users don’t even notice a short disruption, and if they do, some aren’t even bothered by it – contrast this with the cost and effort of taking measures. Depending on the level of site reliability needed, monitoring measures range from user input to active monitoring to automatic rollbacks.

You also have to decide how to allocate this downtime at each level – the closer you are to the physical hardware, the lower the downtime needs to be.
Whereas site reliability should not be a single responsibility, this is where all teams need to work together.

Finally, Hannah explained the security aspects that need to be considered with software. Bugs like Log4Shell can be avoided with the right security mindset. An open culture is important here, where you can also discuss and criticize your own concept.

When creating the security concept, you should also consider the people who implement the measures as well as how to automate it. Some security aspects should also not be carried out by individual teams alone, but across entire teams. You can avoid a strong leftward slide towards the dev team with this and still not work in isolated silos if you have a user-centric focus with it and pay attention to the people in the process.

That was just a short summary of Hannah Foxwell’s talk at stackconf 2022. You can watch her full talk on our YouTube Channel.
I’m already looking forward to the talks at the next stackconf and the opportunity to share thoughts and experiences with a wide variety of cool people there.

Take a look at our conference website to learn more about stackconf, check out the archives and register for our newsletter to stay tuned!

Michael Kübler
Michael Kübler
Systems Engineer

Michael war jahrelang in der Gastronomie tätig, bevor er 2022 seine Umschulung als Fachinformatiker bei Netways abschloss. Seitdem unterstützt er unsere Kunden bei ihren Projekten als MyEngineer und sucht auch nebenbei kleinere Projekte, die er realisieren kann. Privat geht er gerne Campen und fährt Rad. Er genießt auch einen entspannten Abend daheim mit einem Buch und Whisky.

stackconf 2022 | Network Service Mesh

Wow, it was an outstanding stackconf 2022! Many thanks to all the organizers, speakers and participants who made it possible throughout three days full of Open Source Infrastructure Love. I am truly blessed to have been a part of this amazing event!

For all of you who would like to catch up on the specialist lectures of this special event, all the talks are available for you on YouTube. My blogpost today is all about Ricardo Castro and his talk ”Network Service Mesh”. Due to personal circumstances, he was unfortunately prevented from being present on site and sharing his expertise with us in person. Nevertheless, he was able to record his talk in advance and send it to us. So, I am going to recap his talk for you in this blog and give some insights into my findings.

Why Network Service Mesh?

Microservice architectures build applications as a collection of loosely coupled services. In this type of architecture, the goal is for the services to be fine-grained. The main adjective for teams is that the services are independent of other services. By building loosely coupled services, the types of dependencies and associated complexities are eliminated. Reliably managing network connectivity on distributed systems brings its own set of challenges. Things like service discovery, load balancing, fault tolerance, metrics collection, and security are some examples of issues that arise with distributed systems.

In short, how do you facilitate workload collaboration to create an application on the fly that communicates regardless of where those workloads are running? Ricardo is going to answer this and plenty of other questions for us below, so prick up your ears!

Traditional Service Meshes

A service mesh is a connective tissue between all of your services. It adds additional capabilities such as traffic control, service discovery, load balancing, resilience, observability, security, etc. A service network allows applications to offload these capabilities from application-level libraries. A service mesh typically consists of two components, the data plane and the control plane. The service mesh data plane is made up of business services along side the proxies, which are responsible for all the traffic. The service mesh control plane comprises a set of services providing administrative functions necessary for the control of the service mesh. There are many of the service mesh implementations that can help you in many ways. Here you can see a few of them:

Service meshes connectivity work well within a runtime domain, but often workloads need to interoperate with other services that are outside the runtime domain to provide full functionality. But what are runtime and connectivity domains?

A runtime domain is essentially a compute domain that runs workloads. Traditionally, there is exactly one connectivity domain in a runtime domain because connectivity domains are coupled with runtime domains. This means that only workloads that are within the runtime domain can be part of the connectivity domain. To truly take advantage of this approach, microservice components must at least be interdependent. Application service networks work well within the connectivity domain and their target Layer 7 such as https. They can help cloud-native workloads achieve loose coupling through functionality such as service discovery and fault tolerance. Ricardo has explored some of the use cases where traditional service meshes fail and where network service meshes can mitigate these challenges.

Where Do L7 Service Meshes Fall Short?

The first example is the Multi/Hybrid Cloud center. In this scenario, you have multiple distinct and independent cloud clusters that can be public, private or on-premises. The workloads in each of these clusters must have the ability to communicate with each other. In short, you have a number of workloads running in different connectivity domains that need to be connected together.

Another example is the Multi-Corp/Extra-Net scenario. In this scenario, workloads from different clusters also need to interconnect independent of the connectivity domains they are running in. The main difference between this and the hybrid cloud scenario is that you now have different domains of administrative control. This means that different organizations are managing their own runtime and connectivity domains. The workloads in each of these domains have to work with each other in some fashion.

Service mesh balkanization is also another example of how traditional Layer 7 service meshes fall short. For example, traditional service meshes work well within the same Kubernetes cluster. When local service meshes need to collaborate, they are typically connected via Gateways. These are usually static Layer 7 routes that are very difficult to scale and maintain. However, traditional service meshes cannot guarantee reliable networking in such a scenario because they assume there is a reliable Layer 3 underneath them, which is usually the case within the same cluster, but cannot be guaranteed beyond that. Ricardo goes on to examine how network service meshes have tackled these challenges.

Network Service Mesh (NSM)

Before we dig in further here’s an excellent description, as Ricardo did, of what network service mesh actually is.

Network Service Mesh is a hybrid/multi-cloud IP service mesh that enables Layer 3 (L3), zero trusts, pure network service connectivity, security and observability. It works with your existing K8s Container Network Interface (CNI), provides pure workload granularity, requires no changes to K8s and no changes to your workloads. Network Service Mesh tackles all of the above mentioned challenges by enabling individual workloads to connect securely, regardless of where they are running. Your runtime domain provides connectivity between clusters and requires no intervention. Before we dive into the key concepts of network service meshes, here are some use cases where NSM can be very useful. Basically, these are the same examples where traditional service meshes fall short:

  • A common Layer 3 domain that allows Databases (DB) running in multiple clusters, clouds, etc. to communicate with each other, such as DB replication.
  • A single Layer 7 service mesh such as Kuma, Linkerd, etc., that connects workloads running in multiple clusters, clouds, etc.
  • A single workload connecting to multiple Layer 7 service meshes.
  • Workloads from multiple organizations connecting to a single collaborative service mesh that is accessible across organizations.

NSM Key Concepts

In a network service mesh, a network service is a collection of features that are applied to the network traffic. These functions provide connectivity, security and observability. A network service client is a workload that requires a connection to the network service by specifying the name of the network service. Clients are independently authenticated by Spiffe ID and must be authorized to connect to the requested network service. In addition to the network service name, a client may declare a set of key value pairs called labels. These labels can be used by the network service to select the proper endpoint. Or they can be used by the endpoint itself to influence the way it provides service to the client.

A network service mesh endpoint is the entry point to the network service for the client. The network service is identified by its name and carries a payload. Network services are registered in the network service register. In short, an endpoint can be a Pod running on the same or a different K8s cluster, an aspect of the physical network or anything else to which packets can be delivered for processing. A client and an endpoint are connected by a virtual wire (vWire). A vWire acts as a virtual connectivity link between clients and an endpoint. A client may also request the same network service multiple times and therefore multiple vWires can lead to the requested endpoint.

Network Service Mesh API

The Network Service Mesh API includes the Request, Close and Monitor functionality. A client can send a Request-GRPC call to the Network Service Mesh to establish a virtual circuit between the client and the network service. For closing a virtual wire between a client and the network service, a client can send a Close-GRPC call to the NSM. A vWire between a client and a network service always has a limited expiration time, so the client usually sends a new Request message to refresh its vWire. A client can also send a Monitor GRPC call to the NSM to obtain information about the status of a vWire it has to a Network Service. If a vWire exceeds its expiration time without being refreshed, the NSM cleans up that vWire.

Network Service Mesh Registry

Like any other mesh, the Network Service Mesh has a Registry in which network services and network service endpoints are registered. A network service endpoint provides one or more network services. It registers a list of network services by name in the registry, which it annotates, and the destination labels that it advertises for each network service. Optionally, a mesh service can specify a list of matches that allows matching with the source identifiers of the client sending its request to the destination labels advertising when it is registered by the endpoint. You can find a detailed description in the documentation or on YouTube, where Ricardo explains it very well in person.

Well, how do you put it all together?

At a higher level, a Network Service Mesh registers one or more network services within the NSM. It registers that list by name and the destination labels that advertising each network service. A client can send a request to the network service to establish a vWire connectivity between the client and the network service. It can also close the connection, and since the vWire has an expiration time, a client can monitor its vWire state. The interesting thing about the NSM is that it allows workloads to request services regardless of where they are running. It is up to the NSM to create the necessary vWires and enable communication between workloads in a secure manner.

Conclusion

In general, Ricardo’s talk covers the issues that Network Service Mesh addresses and its key concepts, however there is much more to talk about, as he told us. He recommends you to look at Floating Inter-domain, where a client requests a network service from any domain in the network service registry, regardless of where it is running. It can also be valuable to take a look at some of the advanced features, as he encourages us to do. The Network Service Mesh Match process, which selects candidate endpoints for specific Network Service Meshes, can be used to implement a variety of advanced features such as Composition, Selective Composition, Topologically Aware Endpoint Selection, and Topologically Aware Scale from Zero. Also, another important note Ricardo shared with us is that Network Service Mesh has been part of the CNCF project since 2019 and is currently in the Sandbox maturity level. If you don’t know what CNCF is, it’s definitely worth to take a look at the referenced documentation.

It’s a very interesting and informative topic and I could learn a lot from Ricardo. I have never dealt with it before, but now it is quite simple to keep track after watching this great talk. I hope I was able to somehow reflect what Ricardo was talking about throughout the text. As I mentioned at the beginning, the recorded video of this lecture and all the other stackconf talks are available on YouTube! Enjoy it!

Take a look at our conference website to learn more about stackconf, check out the archives and register for our newsletter to stay tuned!

Yonas Habteab
Yonas Habteab
Developer

Yonas hat 2022 seine Ausbildung zum Fachinformatiker für Anwendungsentwicklung bei NETWAYS erfolgreich abgeschlossen. Er hat Spaß daran, sich stets Wissen über das Programmieren anzueignen und widmet sich leidenschaftlich seiner Tätigkeit bei Icinga. Wenn er mal nicht am Programmieren ist, kickt er gern mit seinen Freunden Fußball oder geht in der Stadt spazieren und genießt ruhige Abende vor dem Fernseher.

stackconf 2022 | A big „Thank You“ to our Sponsors & Partners!

This year’s Open Source Infrastructure Conference was an absolute successful onsite event and today we take the opportunity to thank all of our contributors for their great support!

 

Sponsors & Partners

We’re happy to shout out a big thank you to

Thanks to all of you whether sponsor or media partner! We were it was a great pleasure to have you on board and hope you have enjoyed your sponsorship benefits!

 

Take a glance back!

For all of you who either couldn’t join stackconf 2022 or want to work on the given lectures as a follow-up we provide all speaker talks including slides and videos as well as lots of awesome photographs in our archives. So use the chance and take a walk down memory lane. Enjoy lots of interesting and inspiring talks around open source infrastructure solutions. There’s something for everyone!

 

Stay tuned!

You are already excited about taking part in stackconf 2023? Well, we have good news for you! The conference will take place in September next year – so stay tuned!

Katja Kotschenreuther
Katja Kotschenreuther
Marketing Manager

Katja ist seit Oktober 2020 Teil des Marketing Teams. Als Online Marketing Managerin kümmert sie sich neben der Optimierung unserer Websites und Social Media Kampagnen hauptsächlich um die Bewerbung unserer Konferenzen und Trainings. In ihrer Freizeit ist sie immer auf der Suche nach neuen Geocaches, bereist gern die Welt, knuddelt alle Tierkinder, die ihr über den Weg laufen und stattet ihrer niederbayrischen Heimat Passau regelmäßig Besuche ab.