Telegram-канал devopslibrary - DevOps&SRE Library: Unsorted

DevOps&SRE Library

08 April 2024 18:00

How to deal with alert fatigue head-on

Everyone experiences stress at work—thankfully, it’s a topic folks aren’t shying away from anymore.

But for on-call engineers, alert fatigue is a phenomenon closer to home. Unfortunately, like stress, it can be just as insidious and drastically impact those it affects.

First discussed in the context of hospital settings, this phrase later entered engineering circles. Alert fatigue is when an excessive number of alerts overwhelms the individuals responsible for answering them, often over a prolonged period, resulting in missed or delayed responses, or them being ignored altogether

The impact of this fatigue can have an effect beyond the individual and can create significant risks for your organization.

But, if you approach on-call the right way, you can mitigate the impacts of alert fatigue or, better yet, avoid it altogether. Here, we'll dive into the tactics teams can implement to address alert fatigue and its underlying causes.

https://incident.io/hubs/on-call/dealing-with-alert-fatigue-head-on

Читать полностью…

DevOps&SRE Library

08 April 2024 09:01

Service Level Agreement

Introduction to the SLA in relation to SLI and SLO

https://blog.alexewerlof.com/p/sla

Читать полностью…

DevOps&SRE Library

07 April 2024 09:01

Best practices for monitoring software testing in CI/CD

A key challenge of monitoring your CI/CD system is understanding how to optimize your workflows and create best practices that help you minimize pipeline slowdowns and better respond to CI issues. In addition to monitoring CI pipelines and their underlying infrastructure, your organization also needs to cultivate effective relationships between platform and development teams. Fostering collaboration between these two teams is a critical and equally valuable aspect of improving the reliability and performance of your CI.

In this post, we’ll explore how platform teams can help developers visualize trends in CI test performance and notify them of new flaky tests, test failures, and performance regressions with dashboards and monitors. We’ll also detail best practices that can help developers identify, investigate, and remediate flaky tests.

https://www.datadoghq.com/blog/best-practices-for-monitoring-software-testing

Читать полностью…

DevOps&SRE Library

06 April 2024 09:00

Fine-grained RBAC for GitHub Action workflows With GitHub OIDC and HashiCorp Vault

https://www.digitalocean.com/blog/fine-grained-rbac-for-github-action-workflows-hashicorp-vault

Читать полностью…

DevOps&SRE Library

05 April 2024 17:00

garnet

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.

https://github.com/microsoft/garnet

Читать полностью…

DevOps&SRE Library

05 April 2024 09:01

How Figma’s databases team lived to tell the scale

Our nine month journey to horizontally shard Figma’s Postgres stack, and the key to unlocking (nearly) infinite scalability.

Figma’s database stack has grown almost 100x since 2020. This is a good problem to have because it means our business is expanding, but it also poses some tricky technical challenges. Over the past four years, we’ve made a significant effort to stay ahead of the curve and avoid potential growing pains. In 2020, we were running a single Postgres database hosted on AWS’s largest physical instance, and by the end of 2022, we had built out a distributed architecture with caching, read replicas, and a dozen vertically partitioned databases. We split groups of related tables—like “Figma files” or “Organizations”—into their own vertical partitions, which allowed us to make incremental scaling gains and maintain enough runway to stay ahead of our growth.

https://www.figma.com/blog/how-figmas-databases-team-lived-to-tell-the-scale

Читать полностью…

DevOps&SRE Library

04 April 2024 17:01

excalidraw

An open source virtual hand-drawn style whiteboard.

https://github.com/excalidraw/excalidraw

Читать полностью…

DevOps&SRE Library

04 April 2024 09:00

gritql

GritQL is a declarative query language for searching and modifying source code.

https://github.com/getgrit/gritql

Читать полностью…

DevOps&SRE Library

03 April 2024 17:00

lapdev

Lapdev is a self hosted application that spins up remote development environments on your own servers or clouds. It scales from a single machine in the corner to a global fleet of servers. It uses Devcontainer open specification for defining your development environment as code. If you’re interested in a deep dive into how Lapdev works, you can read about its architecture here.

https://github.com/lapce/lapdev

Читать полностью…

DevOps&SRE Library

03 April 2024 09:01

monolith

A data hoarder’s dream come true: bundle any web page into a single HTML file. You can finally replace that gazillion of open tabs with a gazillion of .html files stored somewhere on your precious little drive.

Unlike the conventional “Save page as”, monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that is a joy to store and share.

If compared to saving websites with wget -mpk, this tool embeds all assets as data URLs and therefore lets browsers render the saved page exactly the way it was on the Internet, even when no network connection is available.

https://github.com/Y2Z/monolith

Читать полностью…

DevOps&SRE Library

02 April 2024 17:00

agola

CI/CD redefined

https://github.com/agola-io/agola

Читать полностью…

DevOps&SRE Library

02 April 2024 09:01

wezterm

A GPU-accelerated cross-platform terminal emulator and multiplexer written by @wez and implemented in Rust

https://github.com/wez/wezterm

Читать полностью…

DevOps&SRE Library

01 April 2024 10:03

Разместите свое оборудование или арендуйте юнит в современном дата-центе уровня Tire III.

🛍Аренда юнита или целой стойки со скидкой до 15% в дата-центре «Филанко»!

При аренде сервера у нас вы получите:

• Бесперебойное энергоснабжение с возможностью нагрузки до 11 кВт на стойку;
• Порт 100 Мбит/с + 1 IP-адрес;
• Комплектующие от ведущих мировых производителей;
• Гарантированный обмен трафиком более чем с 540 операторами связи;
• Установку ОС на базе Linux бесплатно;
• Круглосуточную техническую поддержку и помощь в администрировании;
• Персональные скидки и предложения.

↘ Узнать подробнее о дата центре или оставить заявку

🎁Получить скидку

✈Подписаться на канал дата-центра «Филанко».

Реклама. ООО «Ситителеком Санкт-Петербург», ИНН 7838067849, erid: 2VtzqwzgXv5

Читать полностью…

DevOps&SRE Library

31 March 2024 17:00

kube-metrics-adapter

Kube Metrics Adapter is a general purpose metrics adapter for Kubernetes that can collect and serve custom and external metrics for Horizontal Pod Autoscaling.

https://github.com/zalando-incubator/kube-metrics-adapter

Читать полностью…

DevOps&SRE Library

30 March 2024 16:00

dragonfly-operator

Dragonfly Operator is a Kubernetes operator used to deploy and manage Dragonfly instances inside your Kubernetes clusters.

https://github.com/dragonflydb/dragonfly-operator

Читать полностью…

DevOps&SRE Library

08 April 2024 11:04

erid: 2Vtzqx6QSDc

Хотите улучшить свои навыки в разработке программного обеспечения и принимать решения на основе данных? Тогда этот открытый урок для вас!

На вебинаре вы узнаете, как использовать ArgoCD — инструмент для управления конфигурациями, который позволяет получать информацию через API и анализировать динамику системы. Мы рассмотрим различные метрики, такие как DORA, Engineering и MTT, которые помогут вам понять узкие места и аргументированно предлагать изменения, основываясь на данных.

Урок будет полезен всем, кто хочет применять подход «решения на основе данных» в своей работе. По итогам урока вы получите готовый фреймворк «Как начать работать с метриками».

Встречаемся 10 апреля в 20:00 МСК в рамках курса «SRE практики и инструменты».

Регистрация на бесплатный урок по ссылке: https://clck.ru/39qRHt

Читать полностью…

DevOps&SRE Library

07 April 2024 17:00

Documentation as code: Principles, workflow, and challenges

Core principles of documentation-as-code tools

- Treating documentation with the same rigor as code
- Storing documentation in version control
- Automation of documentation generation and deployment
- Peer review processes for documentation updates

https://www.tabnine.com/blog/documentation-as-code-principles-workflow-and-challenges

Читать полностью…

DevOps&SRE Library

06 April 2024 17:02

Properly Running Kubernetes Jobs with Sidecars in 2024 (K8s 1.28+)

Kubernetes has been a great orchestrator of Jobs and CronJobs for over half a decade now, but if you had a need for running proxy containers or other secondary containers alongside the job, running things properly took a bit of work and decision-making to handle gracefully.

This article introduces the easiest way to run Jobs with sidecars using the latest Kubernetes features, and has a complementary repository with complete example manifests you can try in your own cluster. The repository contains all the examples for earlier versions of K8s as well, so make sure to focus on the cronjob.sidecar.*.yaml examples.

https://medium.com/teamsnap-engineering/properly-running-kubernetes-jobs-with-sidecars-in-2024-k8s-1-28-ad9b51d17d50

Читать полностью…

DevOps&SRE Library

05 April 2024 18:03

Устали тушить пожары на пайплайне? Давайте разбираться, как их избежать!

На онлайн-митапе «CI/CD и SRE: Архитектура безупречного деплоя» от Сбера при поддержке JUG Ru Group.

Митап пройдет 9 апреля в 17:00 (МСК, GMT+3). Ссылку на трансляцию отправим за 1 час до начала.

От экспертов из Сбера и VK вы узнаете:
✔️ Какие правила помогут надежно подготовиться к выходу в Production.
✔️ Как не сломать observability своими руками. И что делать, если все же сломали.
✔️ Почему энтерпрайзу больше подходит распределенная система.
✔️ Как сделать Jenkins стабильным в крупных проектах.
✔️ Зачем делать свои инструменты, когда есть open source.
✔️ Как за один день настроить конвейер от CI до PROD.

Регистрируйтесь на сайте митапа.

Реклама. ПАО Сбербанк. ИНН 77070838

Читать полностью…

DevOps&SRE Library

05 April 2024 11:04

Облачные технологии повышают гибкость инфраструктуры на 22%
А где гибкость, там чаще деплои, меньше время внесения изменений и восстановления работоспособности. Знакомьтесь, VK Cloud — безопасная и технологичная платформа с широким набором облачных сервисов для эффективной разработки и работы с данными.

🔹 Все, что нужно для разработки: виртуальные машины, базы данных, GPU, Kubernetes, S3-хранилище, бэкапы, решения для машинного обучения и работы с Big Data.
🔹 Аудит, миграция, мониторинг и другие лучшие практики VK от команды опытных инженеров.
🔹 Комплексная защита веб-сервисов от атак и взломов.

Зарегистрируйтесь в VK Cloud и получите 3 000 ₽ для тестирования облачных сервисов в течение 60 дней!

Читать полностью…

DevOps&SRE Library

04 April 2024 18:04

❓Как создавать и настраивать различные типы сервисов в Kubernetes? Эта тема актуальна, так как играет ключевую роль в развертывании масштабируемых и надежных приложений в контейнерах.

👨‍🎓Освойте ее на бесплатном практическом уроке от OTUS.
На вебинаре вы узнаете, как создавать и настраивать различные типы сервисов в Kubernetes:

✔️ClusterIP для внутренних связей;
✔️ ExternalService для внешнего доступа;
✔️NodePort для открытия порта на уровне узла;
✔️LoadBalancer для балансировки нагрузки.

📆Занятие пройдёт 11 апреля в 20:00 (мск) в рамках набора на онлайн-курс «Инфраструктурная платформа на основе Kubernetes».

💥Спикер — преподаватель курса и действующий Senior DevOps Engineer. Также на вебинаре вы сможете задать эксперту вопросы о самом курсе и перспективах выпускников.

👉Пройдите короткий тест прямо сейчас, чтобы посетить бесплатный урок https://vk.cc/cvW9v2

🔥Для всех, кто пройдет вступительный тест и запишется на бесплатный вебинар этого курса, будет доступна спец.цена на курс — обсудите свое обучение с менеджерами OTUS!

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, www.otus.ru, erid: 2VtzqvEGzwY

Читать полностью…

DevOps&SRE Library

04 April 2024 11:52

🔝 Сбер, Островок.ру, B2Broker, Яндекс, Иннотех, Andersen и многие другие уже используют для проверки своих систем Chaos Engineering.

Крупные компании ищут сотрудников, которые умеют тестировать системы. А мы запускаем видеокурс, который поможет вам расширить стек технологий и получить новый полезный навык.

В результате курса вы:

✅ поймете, зачем разбираться в Chaos Engineering и какие эксперименты существуют;

✅ узнаете, с помощью каких инструментов можно реализовать эксперименты, и как выбрать подходящий;

✅ получите навык тестирования нескольких гипотез в рамках нескольких экспериментов;

✅ научитесь объяснять результаты экспериментов руководству;

✅ разберетесь, как генерить гипотезы;

✅ сможете научить коллег этому подходу.

Релиз курса — 22 апреля, но уже сейчас мы проводим конкурс на 3 бесплатных места для тех, кто хочет научиться управлять хаосом!

ПОДРОБНОСТИ 📌

Читать полностью…

DevOps&SRE Library

03 April 2024 18:04

Примите участие в ежегодном исследовании состояния DevOps в России!

Компания «Экспресс 42» совместно с Deckhouse, Yandex Cloud, hh.ru, AvitoTech, «Тинькофф», JUG Ru Group и OTUS проводят масштабное исследование состояния DevOps в России.

В рамках этого исследования мы просим всех, кому интересна сфера DevOps, поделиться своей экспертизой. Опрос анонимный и займёт не более 20 минут.

По завершении опроса у вас появится возможность поучаствовать в лотерее с призами. Мы выберем 50 победителей и разыграем среди них следующие подарки:

🔸 полезные книги о DevOps;
🔸 промокоды со скидкой на образовательные курсы OTUS;
🔸 бесплатные онлайн-билеты на конференцию DevOps 2024;
🔸 подписки Tinkoff Pro;

а также:
🔸 20 подписок Telegram Premium сроком 6 месяцев;
🔸 3 офлайн-билета на конференцию DevOops 2024.

Пройти опрос можно по ссылке 👉 https://anketolog.ru/e/46663395/BYAY0lqA

Читать полностью…

DevOps&SRE Library

03 April 2024 11:15

erid: 2VtzqvS4WLW

Хотите улучшить свои навыки в разработке программного обеспечения и принимать решения на основе данных? Тогда этот открытый урок для вас!

На вебинаре вы узнаете, как использовать ArgoCD — инструмент для управления конфигурациями, который позволяет получать информацию через API и анализировать динамику системы. Мы рассмотрим различные метрики, такие как DORA, Engineering и MTT, которые помогут вам понять узкие места и аргументированно предлагать изменения, основываясь на данных.

Урок будет полезен всем, кто хочет применять подход «решения на основе данных» в своей работе. По итогам урока вы получите готовый фреймворк «Как начать работать с метриками».

Встречаемся 10 апреля в 20:00 МСК в рамках курса «SRE практики и инструменты».

Регистрация на бесплатный урок по ссылке: https://clck.ru/39qRHt

Читать полностью…

DevOps&SRE Library

02 April 2024 18:10

Разработчики Yandex Cloud расскажут, что скрыто «под капотом» сервисов

4 апреля мы проведем уже ставший традиционным митап about:cloud – infrastructure, где расскажем об устройстве инфраструктурных и сетевых сервисов.

На встрече мы поговорим:

• как устроен сервис, связывающий мир виртуальных сетей с классическими маршрутизаторами и сетевыми устройствами,
• как мы подружили Yandex Monitoring и Prometheus®,
• про компоненты для построения высоконагруженного и стабильного облачного DNS,
• о сервисе для проведения нагрузочного тестирования и анализа производительности,
• об устройстве сетевого блочного хранилища и типах дисков.

about:cloud – infrastructure – это возможность обменяться опытом с разработчиками, архитекторами, devops-специалистами, обсудить решение «нетривиальных» технических задач, получить ответы на самые «горячие» вопросы.

Присоединяйтесь

Читать полностью…

DevOps&SRE Library

02 April 2024 11:03

С чего начать изучение микросервисной архитектуры?

Прийти на бесплатный практический урок «Аутентификации и авторизация микросервисов», где опытный эксперт разберет:

1. Введение в микросервисную архитектуру
2. Паттерн аутентификации в микросервисах
3. Паттерн авторизации и управление доступом
4. Безопасность и мониторинг

Занятие пройдёт 3 апреля в 20:00 мск в рамках курса «Microservice Architecture». Доступна рассрочка на обучение!

Пройдите короткий тест прямо сейчас, чтобы посетить бесплатный урок и получить запись: https://vk.cc/cvSzp7

Реклама. ООО «Отус онлайн-образование», ОГРН 1177746618576, www.otus.ru, erid: 2VtzqvvLdUJ

Читать полностью…

DevOps&SRE Library

01 April 2024 17:01

alacritty

Alacritty is a modern terminal emulator that comes with sensible defaults, but allows for extensive configuration. By integrating with other applications, rather than reimplementing their functionality, it manages to provide a flexible set of features with high performance. The supported platforms currently consist of BSD, Linux, macOS and Windows.

https://github.com/alacritty/alacritty

Читать полностью…

DevOps&SRE Library

01 April 2024 09:00

zellij

Zellij is a workspace aimed at developers, ops-oriented people and anyone who loves the terminal. Similar programs are sometimes called "Terminal Multiplexers".

Zellij is designed around the philosophy that one must not sacrifice simplicity for power, taking pride in its great experience out of the box as well as the advanced features it places at its users' fingertips.

https://github.com/zellij-org/zellij

Читать полностью…

DevOps&SRE Library

31 March 2024 09:00

helm-compose

Helm Compose is a tool for managing multiple releases of one or many different Helm charts. It is heavily inspired by Docker Compose and is an extension of the package manager idea behind Helm itself. It allows for full configuration-as-code capabilities in an single yaml file.

https://github.com/seacrew/helm-compose

Читать полностью…

DevOps&SRE Library

30 March 2024 08:02

alaz

Alaz is an open-source Ddosify eBPF agent that can inspect and collect Kubernetes (K8s) service traffic without the need for code instrumentation, sidecars, or service restarts. This is possible due to its use of eBPF technology.

Alaz can create a Service Map that helps identify golden signals and problems like:

- High latencies between K8s services
- Detect 5xx HTTP status codes
- Detect Idle / Zombie services
- Detect slow SQL queries

https://github.com/ddosify/alaz

Читать полностью…