techleadbits | Unsorted

Telegram-канал techleadbits - TechLead Bits

388

Explore articles, books, news, videos, and insights on software architecture, people management, and leadership. Author: @nelia_loginova

Subscribe to a channel

TechLead Bits

Obsidian: Task Management. Part 2.

When Obsidian plugins are configured, it's time to organize a process to work with tasks.

My approach is simple:

1. Add tasks to related notes. During the work, meetings, investigations I put tasks directly to the related notes with short description, priority, tags and due date.
For example, I had a meeting where we discussed CI improvements. As a result I created a note with meeting minutes and add tasks that were on my side - talk with IT team to improve CI cluster stability, add retries to the test collections, etc.

2. Create TODO. When I don't want to think where I should put a task, I just write it down to the TODO file. TODO is an unsorted list of tasks that I collect during the day.
For example, a colleague requested to help or provide specific information, PM requested to provide sprint status, etc.

3. Create Today view. As tasks are spread across different notes, I need to collect them in a single place. So I created a special page called Today. I don't write any tasks here, I use it as a daily dashboard with the following sections:
🔸 Focus. Key global topics to focus on, static.
🔸 Doing: Tasks are already in progress

 ```tasks  
status.type is in_progress
sort by due
short mode
group by tags
```


🔸 Do Now: My backlog, grouped by context like management, tech, education. I also have an “others” group for everything else (otherwise I sometimes loose tasks without tags 🫣 ):
```tasks
not done
sort by due
sort by priority
short mode
tags include #management
group by tags
```


➡️ Tip: short mode gives a link to the source note with the task, it's helpful to navigate into a full context.

4. Task Board. I use it as time line for the tasks: today, tomorrow, overdue, etc.

I don't pretend my system is ideal, it just works for me and I periodically tune it when I feel something doesn't really work.

Hope it gives you a good starting point to build your own task management system. Start simple, experiment, make the system works for you.

#softskills #productivity

Читать полностью…

TechLead Bits

Obsidian: Task Management. Part 1.

That's a 3rd part of my Obsidian setup series (see Note-Taking with Obsidian and Obsidian - Knowledge Base Approach). Today I'll show how I tune Obsidian for task management and how I actually use it.

Let's start with some additional configuration:

Tasks Plugin
To make suitable for me I added the following:
🔸 "In Progress" Status. By default, tasks are Todo or Done. In most cases it's not enough. Some tasks require time, so "In Progress" indicates tasks that I've already started but not finished.
🔸 Custom Statuses. I also created a few extra statuses like Critical, Idea, Escalation, Talk. They actually work more like task types to distinguish different activities or workflows (and it's super useful to query data in different statuses).
🔸 Status Transitions. Plugin allows to set transitions between statuses: TODO -> In Progress -> Done, Talk -> Done.
🔸 Status Visualization. Additionally I assign icons to the statuses to make tasks easier to identify in the list. I use CSS snippet from ITS Theme.

Colored Tags Wrangler
Tags are good, but color-coded tags are great. I assigned colors for most frequently used tags to group them visually in the file.

Task Board Plugin
The plugin is not not actively maintained, but still useful. It shows tasks grouped by timeline: Today, Tomorrow, Future, Overdue, etc. For me it looks very similar to a Jira Dashboard 😉

On that the configuration is completed, and you're ready to track your own tasks!

#softskills #productivity

Читать полностью…

TechLead Bits

Building Reusable Libraries

For many years, I've worked in product teams where we don't just deliver product features but also build shared services and libraries for other development teams. So the question how to create a good reusable library is really important to me.

That’s why the recent Thoughtworks publication, “Decoupled by design: Building reusable enterprise libraries and services”, caught my attention.

The authors defines the following success factors to build shared libraries:
🔸 Build with at least one customer in mind. Be sure that at least one team is ready to use your library. Use them as early adopters to collect real feedback.
🔸 Set a product vision. Your library or service should solve a specific problem. Know exactly what it is and stay focused on it.
🔸 Make it easy to use. Adoption depends on simplicity. Good documentation, self-service access, clear migration paths — all these help teams use your library.
🔸 Design for extensibility. Follow the open-closed principle — open for extension but closed for modification. This ensures teams can extend the library to meet their specific needs.
🔸 Encourage contributions.  Use open-source model: allow internal teams to contribute to common libraries, services and platforms.
🔸 Continuous improvements. Maintain multiple versions, set versioning strategy (like semver), define clear deprecation policies, stay aligned with industry's tech standards.
🔸 CI/CD. Use functional and cross functional tests to ensure stability and quality for the delivered artifacts.


While the article doesn’t reveal anything really new, it contains good principles to keep in mind during implementation. When done right, shared libraries and services can significantly reduce development costs and time to market for new features and capabilities.

#engineering #architecture

Читать полностью…

TechLead Bits

Zanzibar: Google Global Authorization System

Finally I had a chance to go into details about Zanzibar - Google global authorization system. I already mentioned it in OpenFGA overview where authors said that they based their solution on Zanzibar architecture principles.

Let's check how the system that performed millions of authorization checks per minute is organized:

✏️ Any authorization rule takes a form of a tuple `user U has relation R to object O`. For example, User 15 is an owner of doc:readme. This unification helps to support efficient reads and incremental updates.

✏️ Zanzibar stores ACLs and their metadata in Google Spanner database. Zanzibar logic strongly relies on Spanner external consistency guarantees. So each ACL update gets a timestamp that reflects its order. If update x happens before y, then x has an earlier timestamp.

✏️ Each ACL is identified by shard ID, object ID, relation, user, and commit timestamp. Multiple tuple versions are stored in different rows, that helps to evaluate checks and reads at any timestamp within the garbage collection window (7 days).

✏️ Each Zanzibar client gets a special consistency token called zookie. Zookie contains the current global timestamp. Client uses zookie to ensure that authorization check is based on ACL data at least as fresh as the change.

✏️ Zookies are also used in read requests to guarantee that clients get a data snapshot not earlier than a previous write.

✏️ Incoming requests are handled by aclservers clusters. Each server in the cluster can delegate intermediate results computation to other servers.

✏️ To provide performance isolation Zanzibar measures how much CPU each RPC uses in cpu-seconds. Each client has a global CPU usage limit, and if it goes over, its requests may be slowed down. Each aclserver also limits the total number of active RPCs to manage memory usage.

✏️ Request hedging with 99th percentile threshold is used to reduce tail-latency.

According to the whitepaper, authorization checks are performed for each object independently. It means that each search request for service like Drive or Youtube can trigger from tens to hundreds of authorization checks. That's why the overall architecture is heavily focused on keeping authorization request latency as low as possible.

Implementation results are impressive: Zanzibar handles over 2 trillion relation tuples, that occupy more than 100 terabytes of storage. The load is spread across 10,000+ servers in dozens of clusters worldwide. Despite that scale, it keeps the 95th percentile latency at ~9 ms for in-zone requests and ~60 ms other requests.

#systemdesign #usecase #architecture

Читать полностью…

TechLead Bits

External Consistency

Recently, I read some Google Research whitepapers and came across several concepts that are not widely used but very interesting from a system design point of view. One of such concepts is external consistency.

We’re all more or less familiar with common consistency levels like sequential, strict, linear, casual and eventual. But external consistency is a little bit different:

To be externally consistent, a transaction must see the effects of all the transactions that complete before it and none of the effects of transactions that complete after it, in the global serial order.

It means that if transaction A commits before transaction B (as observed externally by clients), then timestamp(A) < timestamp(B). So all transactions can be represented as a sequential changelog.

In other words, external consistency guarantees that all clients see changes in the same global order, no matter where they are (same datacenter, different datacenters, different regions).

This consistency level is based on timestamp uniqueness across system components. To avoid timestamp duplication Google implemented a special centralized clock synchronization service called TrueTime. It allows to generate monotonically increasing timestamps across all servers.

For more technical details, you can check Why you should pick strong consistency, whenever possible.

External consistency is actively used in Google Cloud Spanner, Google Zanzibar, probably something similar exists in AWS Aurora. What I like about this model is that it shifts the complexity to the storage layer, so app developers can rely on consistency guarantees in their business logic.

#systemdesign #patterns

Читать полностью…

TechLead Bits

Hashicorp Plugin Ecosystem

When Go didn't have a plugin package, Hashicorp implemented their own plugin architecture. The main difference from other plugin systems is that it works over RPC. At first, that might sound a bit unusual, but the approach shows really good results and it is actively used in many popular products like Hashicorp Valut, Terraform, Nomad, Velero.

Key concepts:
✏️ Plugin is a binary that runs an RPC (or gRPC) server.
✏️ A main application loads plugins from a specified directory and runs them as OS child processes.
✏️ A single connection is made between each plugin and the host process.
✏️ The connection is bidirectional, so plugin can also call application APIs.
✏️ Plugin and the application itself must be on the same host and use local network only, no remote calls are allowed.
✏️ Each plugin provides a protocol version that can be used as its API version.
✏️ A special handshake is used to establish a connection. The plugin writes its protocol version, network type, address and protocol to stdout, and the main app uses this information to connect.

Benefits of the approach:
✏️ Plugins can't crash the main process
✏️ Plugins can be written in different languages
✏️ Easy installation - just put a binary into the folder
✏️ Stdout/Stderr Syncing.  While plugins are subprocesses, they can continue to use stdout/stderr as usual and their output will get mirrored to the host app.
✏️ Host upgrade while a plugin is running. 
Plugins can be "reattached" so that the host process can be upgraded while the plugin is still running.
✏️ Plugins are secure
: Plugins have access only to the interfaces and args given to it, not to the entire memory space of the process.

In cloud ecosystem, plugins can be delivered as init containers. During startup, the plugin binary from the init container is copied into the main app container.

If you're designing some pluggable architecture, Hashicorp RPC Plugins is definitely the approach to look at.

#systemdesign #engineering

Читать полностью…

TechLead Bits

Note Taking: Knowledge Base Approach

Recently I published a post about overall Obsidian usage. Today I'd like to share some tips on how I organize my knowledge base (task management deserves a separate post 😉).

The last few years I work as a technical leader at different levels (from 1-2 teams to the division with 5-7 teams). So it's really important for me to remember the technical context for each component under my supervision, the customers I work with, roadmaps, agreements, deadlines and statuses of ongoing activities.

So I came up with the following structure:

✏️ Projects. This domain contains information about time-limited activities: key technical architecture details, requirements, limitations, milestones, stakeholders, etc.

✏️ Product. I work in a product company, so I keep notes regarding product areas I’m responsible for. I organize them by components or architecture concerns like Security, Backup\Restore, Data Streaming, etc. Additionally I have sections for common parts like plans for the release, researches, quality tracking, etc.

✏️ References. This domain is about materials that are subject of my professional interests. I split them into soft skills and technical skills. I usually write short summaries in my own words for quick reference and add links to the initial sources for the reference.

✏️ People. As a leader I work with the people on their growth, so I track communication history with all agreements, roadmap and feedbacks.

✏️ Templates. Obsidian allows using notes templates. I have templates for meeting minutes, ADRs and for some other cases. They help me quickly create notes with a predefined structure.

✏️ Archive. Something that is not actual anymore. I don't delete outdated notes, I move them to the archive for the history.

Additionally to the folder structure I actively use cross-references between notes and tags.

The described approach works for me, but I don't guarantee it will work for you. That's why I recommend to check 2-3 widely used techniques like Second Brain and pick one to begin with. Start with the simplest option and step-by-step adapt it to your needs and personal convenience.

#softskills #productivity

Читать полностью…

TechLead Bits

Note Taking with Obsidian

Today I want to share my experience using Obsidian for personal productivity.

I'm really conservative person when it comes to the tools, so for many years I've been using hand-written notes and Notepad++ . But when I got more responsibilities and teams, I realized it difficult to keep the contexts, plans and agreements in the head or in the pile of files.

So I made one more attempt to find a better approach. My criteria is simple: if I can use an app within 10 minutes and it doesn't annoy me, it's a winner 😃. Obsidian became my love from the first screen.

At first glance, it's really simple: just a tree of files and folders, markdown and tags. That's enough to start. Of course, the real power of Obsidian is in its plugins. Obsidian has a great community with hundreds of plugins for different use cases.

Finally I came up with the following configuration:

✏️ 2 Obsidian Vaults:
1) Personal: my personal knowledge base. I use it with paid sync across all devices.
2) Work: everything is related to work. This vault stores data on work notebook only (you know, NDA, security 😉)

✏️ Folders. I organize data by domains to group projects or huge topics.

✏️ Tags. It's very convenient to mark all information and tasks to easily search them in the future.

✏️ Tasks. It's a separate plugin. It allows to create tasks, specify due dates and priorities. But the best part is that it gives you an ability to query and group tasks from different files into a single view. For example, I have a file Today with the following query:

```tasks
not done
due before tomorrow
sort by due
short mode
group by tags
````


✏️ Escalidraw. Plugin to draw simple Escalidraw diagrams.

✏️ Diagrams. The plugin integrates draw.io diagrams directly into Obsidian (Important! there is a bug, you need to disable Sketch style in settings to make it work). Edit view is not so convenient as drawio app, so to prepare diagrams I still use native app but for preview purposes, I use this plugin.

If IDE is a working place for your code, Obsidian is a working place for your thoughts. My knowledge base is growing, and now I don't understand how I survive without that before 🙂.

#softskills #productivity

Читать полностью…

TechLead Bits

Backup Strategy: Choosing the Right Backup Type

When backup requirements are collected and data is classified, it's time to choose backup frequency and type:

✏️ Full Backup. The complete copy of data is created and sent to another location (different data center or cloud region). The approach is simple but time and resource consuming. Full backups are usually done daily or weekly

✏️ Incremental Backup. It only saves the changes made from the last backup. As data volume is relatively small, this approach is fast and consumes less storage. But recovery procedure takes more time as more backup files need to be applied: full backup + each increment. That's why it's often combined with daily or weekly full backups. Incremental backup is run every 15-60 min depending on how much data you can loose (RPO).

✏️ Differential Backup. It keeps all changes since last full backup. This type stores more data than incremental backups but recovery will be faster as only full and diff backup files will be applied. It's also used in a combination with full backups.

✏️ Forever Incremental Backups. Full backup is performed only once, then only increments are saved. To restore the data, all the incremental backups must be applied in a sequence.

✏️ Synthetic Full Backup. It's an optimization version of forever incremental backups. It combines the last full backup with recent incremental backups into a new "synthetic" full backup that speed up the recovery time.

Most cloud storages support at least full and incremental backups. Other types often depend on the backup software you’re using. When backup types and schedule are defined, you can also calculate backup storage size and costs.

#engineering #systemdesign #backups

Читать полностью…

TechLead Bits

NATS: The Opensource Story

Opensource projects play a key role in modern software development. They are widely used in building commercial solutions: we all know and actively adopt Kubernetes, PosgtreSQL, Kafka, Cassandra and many other really great products. But opensource comes with a risk - a risk that one day a vendor will change the license to a commercial one (remember a story around Elasticsearch 😡?).

If a project become commercial, what can be done further:
✏️ Start paying for the product
✏️ Migrate to an opensource or home-grown alternative
✏️ Freeze the version and provide critical fixes and security patches on your own
✏️ Fork the project and start contributing to it

The actual decision cost will vary depending on the product importance and complexity. But anyway it will be extra costs and efforts.

That’s why, when choosing an open source software, I recommend to pay attention to the following:
✏️ Community: check activity in github repo, response time to issues, release frequency, number of real contributors
✏️ Foundation: if a project belongs to the Linux Foundation or CNCF the risk of a license change is very low

That's why I found a story around NATS (https://nats.io/) really interesting. Seven years ago, the NATS project was donated by Synadia to the CNCF. Since then, community has grown and has ~700 contributors. Of course, Synadia continues to play an important role in NATS development and roadmap.

But in April, Synadia officially requested to take the project back with the plans to change license to Business Source License. From CNCF blog:

Synadia’s legal counsel demanded in writing that CNCF hand over “full control of the nats.io domain name and the nats-io GitHub repository within two weeks.


This is a first attempt to take a full project back and exit from a foundation I've seen. If it succeeds, it will create a dangerous precedent in the industry and kill the trust to the opensource foundations. Actually they exist to prevent such cases and provide protection from a vendor lock.

The good news are that on May 1st CNCF defended its rights for NATS and reached an agreement with Synadia to keep the project within CNCF under Apache 2.0. In that story CNCF demonstrated the ability to protect its projects, so belonging to the CNCF is still a good indicator to choose opensource projects for your needs.

#news #technologies

Читать полностью…

TechLead Bits

Template for storyline of the pitch

Story Sample:
1. Your upcoming presentation deserves to be amazing.
2. Take a deep breath. A big presentation is coming up.
3. But how do you grab people’s attention?
4. Imagine how great it can be.
5. The same old “as usual” presentation doesn't work anymore.
6. Maybe it’s time to try something new?
7. There’s a simple way.
8. You only need three things…
9. It’s the pop-up pitch!
10. What do you have to win?

Source: https://www.danroam.com/

#booknook #softskills #presentationskills #leadership

Читать полностью…

TechLead Bits

The Pop-up Pitch

Do you have situations when you need to sell your ideas to management? Or to explain your solution to the team? Or to convince someone with a selected approach?
The Pop-up Pitch: The Two-Hour Creative Sprint to the Most Persuasive Presentation of Your Life. This is a really helpful book from master of visualization Dan Roam on how to do that (overview of his book regarding visualization is there).

As you can suggest from the book name, it is focused on creation of persuasive presentations. As a base the author uses storytelling principles, sketching, simplicity and emotional involvement to attract the auditory attention.

Main Ideas:

✏️ To make a successful meeting you need to define its purpose. Pop-up pitch is focused on the meetings to present new ideas and meetings for sales (to request an action).

✏️ Every meeting is about persuasion. The most effective approach is positive persuasion when you don't put a pressure on the people, but attract and emotionally involve them. Positive persuasion consists of 3 elements:
1. Benefits. The presenter truly believes that idea is beneficial for the audience.
2. Truth. The idea is something the audience actually wants to get.
3. Achievability. We can do that with a clear step-by-step plan.

✏️ Visual Decoder. You should always start preparation with your idea description the following dimensions:
- Title – What’s the story about?
- Who? What? – Main characters and key elements
- Where? – Where things happen and how people/parts interact
- How many? – Key numbers and quantities, measurements
- When? – Timeline and sequence of events
- Lessons Learned – What the audience should remember at the end

✏️ Pitch. It's a 10-min presentation based on storytelling techniques. Your story should have the element of drama, ups and downs. The whole storyline consists of 10 steps:
1. Clarity. A clear and simple story title.
2. Trust. Establish a connection with the audience, show that you understand their problems.
3. Fear. Problem explanation.
4. Hope. Show how successful results could look like.
5. Sobering Reality. We cannot continue to do the same, we need to change the approach to achieve another results.
6. Gusto. Offer a solution.
7. Courage. Show the result is achievable with key steps and a clear plan.
8. Commitment. Explain what actions are needed.
9. Reward. Show what audience can have in the nearest future.
10. True Aspiration. The big long-term win.

The book is well-structured with step-by-step guidelines to apply recommendations in practice. It makes me rethink some meetings approach and keep in mind that the most important thing in the presentation is not what you want to say, but what the audience is ready to hear.

#booknook #softskills #presentationskills #leadership

Читать полностью…

TechLead Bits

Measuring Software Development Productivity

The more senior position you have, the more you need to think about how to communicate and evaluate the impact of your team’s development efforts. The business doesn't think in features and test coverage, it thinks in terms of business benefits, revenue, costs savings, and customers satisfaction.

There was an interesting post for that topic in AWS Enterprise Strategy Blog called A CTO’s Guide to Measuring Software Development Productivity. The author suggests to measure development productivity in 4 dimensions:

✏️ Business Benefits. Establish a connection between particular feature and business value it brings. Targets must be clear and measurable. For example, “Increase checkout completion from 60% to 75% within three months.” instead of “improve sales”. When measuring cost savings from automation, track process times and error rates before and after the change to show the difference.

✏️ Speed To Market. It is the time from requirement to feature delivery to production. One of the tools that can be used there is value stream mapping. In that approach you draw you process as a set of steps and then can analyze where ideas spend time, whether in active work or waiting for decisions, handoffs, or approvals. This insight helps you plan and measure future process improvements.

✏️ Delivery Reliability. This dimension is about quality. It covers reliability, performance, and security. You need to transform technical metrics (e.g., uptime, rps, response time, number of security vulnerabilities) to the business metrics like application availability, customer experience, security compliance, etc.

✏️ Team Health. Burnout team cannot deliver successful software. The leader should pay attention to the teams juggling too many complex tasks, constantly switching between projects, and working late hours. These problems predict future failures. Focused teams are the business priority.

The overall author's recommendation is to start with small steps, dimension by dimension, carefully tracking your results and share them with the stakeholders at least monthly. Strong numbers shift the conversation from controlling costs to investing in growth.

From my perspective, this is a good framework that can be used to communicate with the business and talk with them using the same language.

#leadership #management #engineering

Читать полностью…

TechLead Bits

ReBAC: Can It Make Authorization Simpler?

Security in general - and authorization in particular - is one of the most complex parts in big tech software development. At first look, it's simple: invent some role and add some verification at the API level, to make it configurable - put the mapping somewhere outside the service. Profit!

The real complexity starts at scale when you need to map hundreds of services with thousands of APIs to hundreds e2e flows and user roles. Things get even more complicated when you add dynamic access conditions—like time of day, geographical region, or contextual rules. And you should present that security matrix to the business, validate and test it. In my practice, that's always a nightmare 🤯.

So from time to time I'm checking what's there in the industry that can help to simplify authorization management. This time I checked the talk Fine-Grained Authorization for Modern Applications from NDC London 2025.

Interesting points:
✏️ Introduce ReBAC - relationship-based access control. That model allows to calculate and inherit access rules based on relationships between users and objects
✏️ To use this approach a special authorization model should be defined. It's kind of yaml configuration that describe types of entities and their relationships.
✏️ Once you have a model, you can map real entities to that and set allow\deny rules.
✏️ Opensource tool OpenFGA already implements ReBAC. It even has a playground to test and experiment with authorization rules.

Overall idea may sound interesting but a new concept still doesn't solve the fundamental problem - how to manage security at scale. That's just yet another way to produce thousands of authorization policies.

The author mentioned that the implementation of OpenFGA is inspired by Zanzibar - Google's authorization system. There is a separate whitepaper that describes main principles of how it works, so I added this whitepaper to my reading list and probably I will publish some details in the future 😉.

#architecture #security

Читать полностью…

TechLead Bits

Kafka 4.0 Official Release

If you’re a fan of Kafka like I am, you might know that Kafka 4.0 was officially released last week. Except the fact that it's the first release that operates entirely without Apache Zookeeper, it also contains some other interesting changes:

✏️ The Next Generation of the Consumer Rebalance Protocol (KIP-848). The team promised significant performance improvements and no “stop-the-world” rebalances anymore.
✏️ Early access to the Queues feature (I already described it there )
✏️ Improved transactional protocol (KIP-890) that should solve the problem with hanging transactions
✏️ Ability to make a whitelist of OIDC providers via org.apache.kafka.sasl.oauthbearer.allowed.urls property
✏️ Custom processor wrapping for Kafka Streams (KIP-1112) that should simplify common code usage across different streams topologies
✏️ Values for some default parameters were changed. Actually it's a public contract change with potential issues during upgrade, so need to be careful with that - KIP-1030
✏️ A big housekeeping work was done, so the version removes a lot of deprecations:
- v0 and v1 message formats were dropped (KIP-724)
- kafka clients versions <=2.1 are not supported anymore (KIP-1124)
- APIs and configs deprecated prior version 3.7 were removed
- Old MirrorMaker (MM1) was removed
- Old java versions support was removed, now clients require Java 11+, brokers - Java 17+

Full list of changes can be found in release notes and official upgrade recommendations.

New release looks like a significant milestone for the community 💪. As always, before any upgrade I recommend to wait for the first patch versions (4.0.x), which will probably contain fixes for the most noticeable bugs and issues.

#engineering #news #kafka

Читать полностью…

TechLead Bits

My task statuses configuration

#softskills #productivity

Читать полностью…

TechLead Bits

Documentation as a Code: Tips & Tricks

Last week I shared Documentation As a Code approach that I actively use in my teams. But to be honest when you just introduce it, you'll face with some resistance.

The usual objections sound like
"The wiki is more convenient", "It has better formatting", "It has native integration with diagrams" (drawio, escalidraw or whatever you use)

I've heard them all, and I want to share some tips how to address them.
Disclaimer: I work mostly with Gitlab and Github, but I'm sure it covers the majority of use cases.

✏️ Diagrams Integration. I use drawio diagrams as they can be easily integrated as pictures into markdown docs:
- PNG Option. You can create diagrams in the draw.io app and export them as PNG with the “Include a copy of my diagram” option. That way, the image is stored in your Git repo, easily embedded in markdown, and still fully editable later — just reopen it in draw.io.
- SVG Option. Another option is to use SVG format + draw.io plugin for your IDE. You can directly add svg files to your markdown document as well as edit this files with drawio later. I use it with IntelliJ IDEA, and I saw the extension for VS Code.

✏️ Complex Formatting. Honestly, in 99% cases you don't need that, and it's solved by restructuring the document. But when you do need something special, then you can use HTML inside markdown.

✏️ Convenience. It's quite subjective point, but I don't know developers who cannot easily work with markdown inside IDE. By the way, I even my personal notes write in markdown, it's just a habit 😉

✏️ Built-in docs into your dev process. Documentation must be updated with the code that brings the changes. To control that I include Definition of Done checklist into MR\PR template, where updating docs is one of the standard items like writing tests.

✏️ Linting. When docs live in the repo, you can apply some automated quality control checks as for any other code. For example, use prettier to keep docs consistently formatted.

✏️ GenAI Integration. Makrdown is really perfect for LLM integration and it's really good starting point to integrate some GenAI bot.

✏️ Keep the Wiki. If you still want to keep your wiki, you can autogenerate it from markdown sources. There are a bunch of tools for that: MkDocs, Github\Gitlab Pages, Confluence git plugin, etc.

That's usually enough for the teams to get started. After that you can tune the process to fit your needs by adding more tools, linters or integrations.
Start simple, make the documentation part of your dev routine, and your documentation becomes alive.


#engineering #documentation

Читать полностью…

TechLead Bits

Documentation As a Code

I have a really strong opinion that the documentation is part of the application. It should be developed, updated and reviewed using the same processes and tools as the application code.

If the documentation is stored somewhere else, like in a separate wiki, it's basically dead within 5 minutes after it's published.

This means documentation should live in the git repo. If some system behavior is changed during bugfixing or a new feature development, the relevant documentation should be updated in the same PR. This approach helps to keep documentation up to date.

It's really simple if you use a monorepo. All docs and code are placed in one place, so it's easy to find what you need. Things become more complicated if you have lots of microrepos. Even if docs are up to date, it's quite hard for users to find them. Usually, this is solved by publishing docs to a central portal as part of the CI process, or nowadays by using an AI bot to help.

Recently, Pinterest published an article about how they adopted the documentation-as-code approach. Since they use microrepos, the main challenge was to make documentation discoverable for their users across hundreds of repos.

What they did:
🔸 Moved their docs to git repos using markdown.
🔸 Used MkDocs in CI to generate HTML versions of the docs.
🔸 Created a central place to host and index docs called PDocs (Pinterest Docs).
🔸 Integrated docs with GenAI — an AI bot connected to the main company communication channels.
🔸 Built a one-click tool to migrate old wiki pages to git.

I don’t know any standard solution for doc aggregation across multiple repos, so it would be great if Pinterest open-sourced their PDocs in the future. I think it could really help a lot of teams to improve their documentation processes.

#engineering #documentation

Читать полностью…

TechLead Bits

Latency Insurance: Request Hedging

One more interesting concept I came across recently is request hedging. I've not seen it's actively used in enterprise software, but it can be useful for some scenarios where tail latency is critical.

Imagine that service A calls service B, service B has multiple instances. These instances can have different response times—some are fast, some are slow. There are number of protentional reasons for such behavior, but we'll skip them for simplicity.

Request hedging is a technique where the client sends the same request to multiple instances, uses the first successful response, and cancel the other requests.

Obviously, if you do this for all requests, the system load will increase and the overall performance will degrade.

That's why hedging is usually applied only to a subset of requests.

The following strategies are used for a request selection:
✏️ Token Buckets. Use a token bucket that refills every N operation and send a sub-request only if there is an available token (rate-limiting).
✏️ Slow Responses Only. Send hedged requests only if the first request takes longer than a specific latency threshold (95th percentile, 99th percentile)
✏️ Threshold. Send hedge requests only if Nth percentile latency exceeds expectation. For example, if the threshold is the 99th percentile, only 1% of requests will be duplicated.

Request hedging is efficient approach to reduce tail latency. It prevents occasional slow operations from slowing the overall user interaction. But If the variance in a system is already small, then request hedging will not provide any improvement.

#systemdesign #patterns

Читать полностью…

TechLead Bits

How to Overcome Procrastination

I'm usually quite skeptical about books with flashy titles that promise to make you more productive, more successful and all other "mores". But the book Do It Today: Overcome Procrastination, Improve Productivity, and Achieve More Meaningful  by Darius Foroux caught my attention because it recommends some practices I actively use.

Let me explain by an example.
I usually read 3-5 books at the same time (I mean I start reading a new book without finishing the previous one). The book selection depends on my mood, energy level, and how much time I have. Sometimes I'm ready to discover complex engineering topics. Other times I prefer to read something lighter around soft skills development. If I feel exhausted and want to re-charge, then I take fiction books.

It always sounds strange to people when I explain this way of reading 😃

So when I saw that reading multiple books is one of the productivity tips, I thought that I have something common with the author and probably other recommendation would also suit me.

The author shares 33 advices to improve personal productivity.

I'll highlight the most important from my perspective:
✏️ Track where you spend your time. Before starting a task, ask yourself: _Do I really need to do this? What happens if I don’t?_
✏️ Plan your day from the night before.
✏️ Write a short summary about completed tasks at the end of the day.
✏️ Pick your outfit for the next day in advance.
✏️ Read every day (books not Internet scrolling). Surround yourself with paper books, read multiple books at the same time, it's fine.
✏️ Spend more time with your loved ones.
✏️ Perform physical activity every day.

The book won't teach you anything really new. But the fact is that even if we know productivity recommendations, we don't usually follow them. From this perspective the book gives you a push to take the first steps in a right direction. And engaging style of writing with a good piece of humor make it easy to read and follow.

#booknook #softskills #productivity

Читать полностью…

TechLead Bits

What Does Technical Leadership Mean?

There are a lot of speculations in the industry about technical leadership term, especially about who a Tech Lead is and who is not. In some companies, Technical Lead is an official title. In others, it can be a Team Lead or an Architect.

But for me, technical leadership is something broader.
Is an architect a technical leader? A staff engineer? A CTO?
My answer is yes.

From my point of view, technical leadership is the ability to set technical vision for the team, solve implementation conflicts and guide teams through architectural decisions.

There is one interesting video regarding the topic - Level Up: Choosing The Technical Leadership Path. The author is explaining his vision of technical leadership:

Technical Leadership is the act of aligning a group of people in a technical context.


Common examples are solving conflicts in a code review regarding implementation approach, aligning coding practicing, defining technical contracts between teams and components, etc.

From that perspective, the author defines the following career paths:
1. Individual Contributor: Junior Engineer, Middle Engineer, Senior Engineer
2. Manager: Engineering Manager
3. Technical Leader: Staff Engineer, Tech Lead, Dev Lead

So individual contributor path is very limited because from some level of seniority you need to collaborate with other people, present your ideas and explain the reasoning behind your decisions. This requires a different set of skills: communication, leadership, empathy, ownership, delegation, coaching, etc. If you decide to grow in a technical leadership direction, you need to develop these skills like you do with any other technical skill.

#leadership #career

Читать полностью…

TechLead Bits

Secret Management Platform at Uber

Secret Management is one of the biggest pain points for modern cloud applications, especially when you need to implement credentials rotation across different types of secrets like OAuth2 clients, database credentials, integration secrets, etc.

Last week Uber published an article about their approach to solve this task:
✏️ Automatic scans for hardcoded passwords on PR level
✏️ Centralized secrets management platform with APIs, UI and CLI to unify CRUD operations
✏️ Secrets inventory with information about owners, secret provider, rotation policy, deployment platform, security impact level
✏️ Integration with Hashicorp Vault (installed per region) on-premise and cloud secret managers (AWS, GCP) for apps in public clouds
✏️ Secrets rollout via integration with deployment systems (in Uber there are 3 of them)
✏️ New secrets rollout monitoring and failure detection
✏️ Automatic rollback to the previous secret value in case of failure
✏️ Monitoring and cleanup of orphaned secrets

The authors said that this system allows Uber to automatically rotate around 20,000 secrets per month, with no human intervention. Moreover, they mentioned that they actively work on secretless authentication to reduce dependencies on traditional secrets. Actually the direction sounds promising: the fewer secrets you have the simpler it is to manage them.

#engineering #usecase #security

Читать полностью…

TechLead Bits

Backup types visualization

#engineering #systemdesign #backups

Читать полностью…

TechLead Bits

Backup Strategy: Identify Your Needs

Data loss is one of the biggest business risk in running modern software systems. And to be honest, it's rarely caused by infrastructure failures. In most real-life cases, it's a result of a buggy upgrade or a human mistake that accidentally corrupts the data.

That's why properly organized backups are the foundation of any disaster recovery plan (DR Strategies overview can be found there).

If you have a monolith app deployed on a single VM, the strategy is simple: just perform a full VM backup. But for microservice solutions with hundreds of services, different types of databases and other persistence storages, the task becomes non trivial.

To build your own backup strategy, you need to answer the following questions:

✏️ What type of data sources do you have? It can be databases, queues, file and blob storages, VMs and other infrastructure components.
✏️ What data is business critical? Classify data based on criticality, different type of data can have different requirements to RPO and even allow some data loss.
✏️ Is data primary or secondary? Some data can be reproduceable from other sources (search indexes, streaming data, deployment configuration, etc.) and it's cheaper to restore it from initial source than perform a backup and fix consistency issues.
✏️ What are the RPO and RTO requirements? According to that you will set up a backup frequency. For example, if RPO is 15 minutes then you’ll need to schedule backups at least every 15 minutes.
✏️ Are there any compliance rules? Some regulations require to keep the data for a specific period of time (e.g., billing and revenue data, personal data). That mostly impacts backup retentions policies and required hardware.

According to the answers you can choose suitable backup types, schedule, recovery and testing strategies. More about that in future posts 😉

#engineering #systemdesign #backups

Читать полностью…

TechLead Bits

The Subtle Art of Support

At IT conferences the main focus is usually on how to build a spaceship (or at least a rocket! 😃) with the latest technologies and tools. Everyone enjoys writing new features, but mostly nobody is excited about fixing bugs. That's why I was really surprised when I've seen a talk about support work - The subtle art of supporting mature products.

The author shared her experience organizing L4 support team. Actually most recommendations are really trivial like organize training sessions, improve documentation, talk with your clients, etc. But the idea to have fully separate support and development teams is really confused me.

From my point of view, such model makes sense for one-time project delivery only: you develop something, deliver it to the customer, make a support hand-over and move on. But it's totally wrong for actively developed products.

In this case separating support from development breaks the feedback loop (we talk about L4 product support, of course). You simply cannot improve a product in a right way if you're not in touch with your customers and their pain. Support is a critical activity for the business. Nobody cares about new features if existing features don't work.

I prefer a model when the teams own a product or component. It means that the team is responsible for both development and support. The better quality you have, the more capacity you can spend for feature development. Such model produces a really good motivation to work on internal stability, process optimizations and overall delivery quality.

One of the simplest way to implement the approach is to to rotate people between support and development work for a sprint, a few sprints, or a full release. In my practice, schema with 2-3 sprints works quite well.

Of course, I often hear arguments that support requires a lot of routine communications because users just "don’t use features correctly". That's why it should be some other people. But for me, that’s a sign that there is something wrong: the product is hard to use, the documentation is poor, test cases are missed, etc. That's exactly the point to perform some analysis and make improvements. And in the era of GenAI teams can automate a lot of support routine and focus on making the products really better.

#engineering #leadership

Читать полностью…

TechLead Bits

Template for Visual Decoder

Source: https://www.danroam.com/

#booknook #softskills #presentationskills #leadership

Читать полностью…

TechLead Bits

Are Microservices Still Good Enough?

There was a lot of hype around microservices for many years. Sometimes they are used with good reasons, sometimes without. But looks like the time for fast growth came to the end, companies started to focus more on cost reduction. It promotes more practical approach for architecture selection.

One of the recent articles about this topic is Lessons from a Decade of Complexity: Microservices to Simplicity.

The author starts with downsides of microservice architecture:
✏️ Too many tiny services. Some microservices become too small.
✏️ Reliability didn't improve. One small failure can trigger cascade failure of the system.
✏️ Network complexity. More network calls produce higher latency.
✏️ Operational and maintenance overhead. Special deployment pipelines, central monitoring, logging, alerting, resource management, upgrades coordination. This is just a small part of what's needed to serve the architecture.
✏️ Poor resource utilization. Microservices can be too small that even 10 millicores are not utilized. It makes the whole cluster resource management ineffective.

Recommendations to select the architecture:
✏️ Be pragmatic. Don’t get caught up in trendy architecture patterns, select what's really needed for your task and team now.
✏️ Start simple. Keeping things simple saves time and pain in the long run.
✏️ Split only when needed. Split services when there’s a clear technical reason, like performance, resource needs, or special hardware.
✏️ Microservices are just a tool. Use them only when they help your team move faster, stay flexible, and solve real problems.
✏️ Analyze tradeoffs. Every decision has upsides and downsides. Make the best choice for your team.

Additionally the author shared a story where he and his team consolidated hundreds of microservices into larger ones. They reduced the total number of microservices from hundreds to fewer then ten. This helped to cut down alerts, simplify deployments, and improve infrastructure usage. The overall solution support became easier and less expensive.

I hope that finally cost effectiveness of technical decisions became a new trend in software development 😉.

#engineering #architecture

Читать полностью…

TechLead Bits

Technology Radar

In the beginning of April Thoughtworks published a new version of Technology Radar with the latest industry trends.

Interesting points:

✏️ AI. There is a significant growth of AI agentic approach in technologies and tools, but all of them still work in a supervised fashion helping developers to automate the routine. No surprises there.

✏️ Architecture Advice Process. Architecture decision process moves to decentralized approach where anyone can make any architectural decision getting advice from the people with the relevant expertise. The approach is based on Architecture Decision Records (ADRs) and advisory forum practices. I made short ADR overview there.

✏️ OpenTelemetry Adoption. Most popular tools (e.g. Loki, Alloy, Tempo) in observability stack added OpenTelemetry native support.

✏️ Observability & ML Integration. Major monitoring platforms embedded machine learning for anomaly detection, alert correlation and root-cause analysis.

✏️ Data Product Thinking. In extended AI adoption many teams started treating data as a product with clear ownership, quality standards, and focus on customer needs. Data catalogs like DataHub, Collibra, Atlan or Informatica become more popular.

✏️ Gitlab CI\CD was moved to adopted state.

Of course, there are much more items in the report, so if you're interested I recommend to check and find trends that are relevant to your tech stack.

Since this post is about trends, I'll share one more helpful tool - StackShare. It shows the tech stacks used by specific companies, and how wide a particular technology is adopted across different companies.

#news #engineering

Читать полностью…

TechLead Bits

Netflixed - The Epic Battle for America's Eyeballs

Recently I visited a bookshop to pick up a pocket book to read during a long flight. I noticed something with a word Netflix and decided to buy it. It was Netflixed: The Epic Battle for America's Eyeballs by Gina Keating.

Initially I thought that's the book about technology or leadership. But it was a story about Netflix's way to success. The book was published in 2013 but it's still relevant as Netflix remains a leader in online streaming today.

The author tells Netflix’s history starting from online DVDs rental service to online movie streaming. A main part of the book focuses on Netflix’s competition with Blockbuster (it's America’s biggest DVD and media retailer at that time). It’s really interesting to see how their market and optimization strategies went through different stages of technology evolution.

I won’t retell the whole book, but there’s one moment that really impressed me. Blockbuster was one step before beating Netflix and become a market leader in online movies services. But at that critical time, disagreements among Blockbuster’s top management led to the company crash.

Most board members failed to see that the DVD era was ending and Internet technologies were the future. They fired the executive who drove the online program and brought a new CEO with no experience in the domain. This new CEO decided to focus on expanding physical DVD stores. Unfortunately, he didn't want to hear about new technologies at all. That leads to full Blockbuster bankruptcy.

What can we learn from this? Some managers cannot accept the fact they are wrong and a bad manager can ruin the whole business. Good leaders must listen to their teams, understand industry trends, and be flexible enough to adapt to the changes. For me, the book felt like a drama story, even though I already knew what's in the end.

#booknook #leadership #business

Читать полностью…

TechLead Bits

Adaptive Integration

Modern solutions typically consists of a mix of services, functions, queues and DBs. To implement an E2E scenario developers need to build a chain of calls to get the result. And if some API is changed, the whole E2E may be broken.

Of course, we have proto specs, Open API, autogenerated clients, but the problem is that any change brings significant adoption overhead to all its dependencies.

Marty Pitt in his talk Adaptive Architectures - Building API Layers that Build Themselves presents an attempt to solve the problem and make changes cheap and fully automated.

I like the part with problem statement, it really describes the pain of existing microservice ecosystem: change API - integration is broken, change message format - integration is broken, change function - you get the idea, right? So you need to be really careful with any contract change and work with all your consumers to make the migration smooth.

Then the author assumes that the reason of that problem is the lack of business semantics in our API specs. And if we add them, the system can automatically generate chain calls to perform any requested operation.

Idea can be represented as the following steps:
✏️ Add semantics to the entities: for example, instead of int id use accountId id across all services in the organization
✏️ Register service specs during startup on a special integration service.
✏️ Any service can call the integration using DSL like Get balance for the account X with a specified email
✏️ The integration service automatically generates an execution chain based on the registered specs. After that it orchestrates all queries and returns the result to the caller.
✏️ If a service changes its API, it simply uploads a new spec version, and the integration service rebuilds the call chain accordingly.

Author and his team already implemented the approach in https://github.com/taxilang/taxilang and https://github.com/orbitalapi.

From my point of view, the system that decides in runtime what APIs to call to perform a business transaction looks uncontrollable and difficult to troubleshoot. So I'm not ready to use the approach in a real production. But the idea sounds interesting, let's see if such tools usage will grow in the future.

#engineering

Читать полностью…
Subscribe to a channel