techleadbits | Unsorted

Telegram-канал techleadbits - TechLead Bits

388

Explore articles, books, news, videos, and insights on software architecture, people management, and leadership. Author: @nelia_loginova

Subscribe to a channel

TechLead Bits

Some graphical representation for concepts from the book

Source: Balancing Coupling in Software Design

#booknook #engineering

Читать полностью…

TechLead Bits

Arbnb: Large-Scale Test Migration with LLM

In all that hype about replacing developers by LLMs I really like to read about practical examples of how LLMs are used to solve engineering tasks. Last week Airbnb published an article Accelerating Large-Scale Test Migration with LLMs where they described the experience to automate migration of ~3.5K React test files from Enzyme to React Testing Library (RTL).

Interesting points there:
✏️ Migration was built as a pipeline with multiple steps, where files are moved to the next stage only after validation on the previous step passed
✏️ If validation is failed, result is sent to LLM one more time with request to fix it
✏️ For small and mid size files the most effective strategy was a brute force: retry steps multiple times until they passed or reached a limit.
✏️ For huge complex files the context was extended with the source code of the component, related tests in the same directory, general migration guidelines and common solutions. Note from the authors that the main success driver there was not prompt engineering but choosing the right related files.
✏️ The overall result was successful migration of 97% of tests, remaining part was fixed manually.

The overall story looks like a huge potential for routine tasks automation. Even with a custom pipeline and some tooling around it, the overall migration with LLM was significantly cheaper than doing it manually.

#engineering #ai #usecase

Читать полностью…

TechLead Bits

Thinking Like an Architect

What makes a good architect different from other technical roles? If you've ever thought about that - I recommend to check a talk from Gregor Hohpe "Thinking Like an Architect"

Gregor said that architects are not the smartest people, they make everyone else smarter.

And to achieve this, they use the following tools:
✏️ Connect Levels. Architects talk with management on a business language and with developers on a technical language. So they can translate business requirements to technical decisions and technical limitations to business impacts.
✏️ Use Metaphors. They use well-known metaphors to explain complex ideas in a simple way.
✏️ See More. Architects see more dimensions of the problem and can do more precise trade-off analysis.
✏️ Sell Options. Estimate and prepare options, sometimes defer decisions to the future.
✏️ Make Better Decisions with Models. Models shape our thinking. If solution is simple, the model is good, if it's not - there is probably something wrong with the model.
✏️ Become Stronger with Resistance. Not all people are happy with the changes, architects can identify what beliefs people hold that make their arguments rationale. By understanding this, architects can influence how people think and work.

I really like Gregor's talks, they are practical, make you think about standard things under different angle and contains a good piece of humor. So if you have time, I recommend to watch the full version.

#architecture

Читать полностью…

TechLead Bits

DR Strategies

When RPO and RTO requirements are defined, it's time to select DR strategy:

✏️ Backup\Restore. The simplest option with quite big downtime (RTO) - hours or even days:
- the application runs on a single region only
- regular backups (full and incremental) are sent to another region
- only active region has reserved capacity to run the whole infrastructure
- in case of disaster the whole infrastructure should be rebuilt on a new region (in some cases it can be the same region), after that application is reinstalled and data is restored from backups

✏️ Cold Standby. This option requires less downtime but still it can take hours to fully restore the infrastructure:
- the application runs on a single region only
- minimal infrastructure is prepared in another region: copy of application or data storage may be installed but it's scaled down or run with minimum replicas
- regular backups (full and incremental) are sent to another region
- in case of disaster the application is restored from backups and scaled up appropriately

✏️ Hot Standby. The most complex and expensive option with minimal RTO measured in minutes:
- both regions have the same capacity reserved
- all applications are up and running on both regions
- data is replicated between regions in near real-time
- in case of disaster one of the regions continues to operate.

What to select is usually depends on availability and business requirements of the services you provide. But anyway DR plan should be defined and documented to know what to do in case of disaster. Moreover it's a good practice to provide regular testing on how to restore the system. Otherwise you may end up with the situation when you have a backup but cannot restore the system, or even worse there will be no actual backups at all.

#architecture #systemdesign

Читать полностью…

TechLead Bits

Region and Availability Zones concepts visualization

Читать полностью…

TechLead Bits

Responsibility Matrix from Shift-Down Security whitepaper

#engineering #security

Читать полностью…

TechLead Bits

eBPF: What's in it?

If you work with cloud apps, you've probably noticed a growing trend to use eBPF for profiling, observability, security and network tasks. To fully understand the potential and limitations of this technology, it's good to know how it works under the hood.

Let's look at how applications are executed from a Linux system perspective. In simple terms, everything operates in three layers:
1. User Space. It's where our applications run. This is the non-privileged part of the OS.
2. Kernel space. The privileged part of the OS that handles low-level operations. These operations usually provide access to the system hardware (file system, network, memory, etc.). Applications interact with it through system calls (syscalls).
3. Hardware. The physical device layer.

eBPF is a technology that allows to embed a program on Kernel OS level, where this program is triggered on particular system events like opening file, reading file, establishing a network connection, etc. In other words, eBPF approach allows to monitor what's going on with your applications on a system level without code instrumentation. One of the earliest and most well-known tools based on this technology is tcpdump.

Some interesting ways companies use eBPF now:
- Netflix introduced bpftop, a tool to measure how long processes spend in the CPU scheduled state. If processes take too long, it often points to CPU bottlenecks like throttling or over-allocation.
- Datadog shared their experience using eBPF for chaos testing via ChaosMesh.
- Confluent adopted Cilium, an eBPF-based CNI plugin for networking and security in Kubernetes.

Over the past few years I've seen more and more adoption of eBPF-based tools across the industry. And looks like trend will continue to grow especially in the area of observability and profiling.

#engineering

Читать полностью…

TechLead Bits

Software Complexity

Have you ever seen a project turned into a monster over time? Hard to understand, difficult to maintain? If so, I highly recommend Peter van Hardenberg’s talk - Why Can't We Make Simple Software?

The author explains what complexity is (it's not the same as difficulty!), why software gets so complicated, and what we can actually do about it.

Common reasons for complex software:
✏️ Defensive Code. Code that starts simple with implementing some sunny day scenario but grows over as more edge cases are handled. Over time, it turns into a mess with too many execution paths.
✏️ Scaling. A system designed for 100 users is really different from one built for 10 million. Handling scale often adds layers of complexity.
✏️ Leaky Abstractions. A well-designed interface should hide complexity, not expose unnecessary details. (A good discussion on this is in Build Abstractions not Illusions post).
✏️ Gap Between Model and Reality. If a software model isn't actually mapped to the problem domain, it leads to growing system complexity that really hard to fix.
✏️ Hyperspace. Problem can multiply when a system has to work across many dimensions—different browsers, mobile platforms, OS versions, screen sizes, and more.

The software architecture degrades over time with the changes made. Every change can introduce more complexity, so it’s critical to keep things simple. Some strategies to do that:
✏️ Start Over. Rebuild everything from scratch. Sometimes, it is the only way forward if the existing architecture can't support new business requirements.
✏️ Eliminate Dependencies. Less dependencies the system has, the easier it is to predict system behavior and make impact analysis.
✏️ Reduce Scope. Build only what you actually need now. Avoid premature optimizations and "nice-to-have" features for some hypothetical future.
✏️ Simplify Architecture. No comments 😃
✏️ Avoid N-to-M Complexity. Reduce unnecessary variability to limit testing scope and system interactions.

Complexity starts when interactions appear. So it is about dynamic system behavior. Watching this talk made me reflect on why systems become so complex and how I can make better design decisions.

#architecture #engineering

Читать полностью…

TechLead Bits

Is Your Cluster Underutilized?

"In clusters with 50 CPUs or more, only 13% of the CPUs that were provisioned were utilized, on average. Memory utilization was slightly higher at 20%, on average," - 2024 Kubernetes Cost Benchmark Report.

This aligns with what I've seen: often there are no enough resources to deploy an app, even the cluster resource usage is less than 50%.

The most common reasons for that:

✏️ Incorrect requests and limits. Kubernetes reserves cluster resources based on requests parameters, while limits prevent a service from consuming more than expected. Typical issues are:
- Requests = Limits. Resources are exclusively reserved for peak loads and cannot be shared with other services. This often happens with memory configuration (standard explanation is to prevent OOM), but mostly all modern languages can manage memory dynamically (e.g., Java since v17 with G1 and Shenandoah GC, Go, and C++ handle this natively).
- Requests > Average Usage. It may seem counterintuitive, but requests should be set below average usage. By the law of large numbers, peak loads is balanced across multiple deployments. Probability that all services hit peak usage at the same time is relatively small.

✏️ Mismatched between resource settings and runtime parameters. Requests and limits should align with language-specific configurations (e.g., GOMEMLIMIT for Golang, Xms and Xmx for java).

✏️ High startup resource requirements. Some services need a lot of CPU and memory just to start, even though they consume far less after that. This requires high resource requests just to make deployment possible. A common example is Spring applications, which consume significant resources to load all beans at startup. Using native compilation or more efficient technologies can help.

As you can see the real problem is somewhere between developers and deployment engineers. Technical decisions and implementation details directly impact resource utilization and infrastructure costs.

#engineering

Читать полностью…

TechLead Bits

AWS Well-Architected Framework

How do you know that application in the cloud is well-architected? And if you think so how to prove that? There is no simple answer. Each major cloud provider, like AWS, Microsoft, and Google, has its own guidelines for that, called Well-Architected Framework.

Well-Architected Framework is a set of key concepts, design principles, and architectural best practices for designing and running workloads in the cloud. Recommendations are optimized for particular provider and their services usage.

AWS was the first who introduced this practice, so I’ll explain the framework based on AWS sample.

AWS Well-Architected Framework is based on 6 pillars:

1. Operational Excellence. AWS defines the goal of operational excellence as an ability to get new features and bug fixes into customers’ hands quickly and reliably. It contains guidelines for automation, continuous delivery, observability, managed service usage.
2. Security. It's more or less standard set of security recommendations: implement required security controls, apply security at all layers, be prepare for security events, etc. On of the most interesting principles there is to keep people away from data. The idea is that if people don't need to get direct access to the data (everything is automated), then the risk of data breach decreases.
3. Reliability. It contains practices to keep system up and running: self-healing, redundancy, recovery procedures, autoscaling to fit increased workload, careful resource management. Most principles there I've already covered in Reliability blog post.
4. Performance. This pillar ensures that computing resources are used efficiently to meet system requirements. It includes experimenting with the hardware, choosing right storage infrastructure for the workload, using already tuned managed services.
5. Cost Effectiveness. It's defined as delivering business value at the lowest cost. Cloud financial management should be established: consumption model adoption, measuring resource usage efficiency, scaling resources down when they are not needed not to pay for them.
6. Sustainability. This pillar was added quite recently and it considers the importance of long-term impact of the business on the environment (I wrote about the trend in Renewable Energy Trend). AWS highlighted that environmental sustainability is a shared responsibility between customers and AWS.

To sum up, the Well-Architected Framework is a set of best practices and design principles for running applications on a specific cloud. However, keep in mind that cloud providers also want to sell more of their services, so it’s important to stay a bit skeptical about managed service recommendations and carefully check the efficiency and costs for your particular payload .

#architecture #engineering

Читать полностью…

TechLead Bits

Radical Candor. Part 2: Managed Growth.

I don't believe there is any such thing as "B-player" or a mediocre human being. Everyone can be excellent at something.


The author suggests to evaluate team members based on two dimensions: performance and growth.

If overall performance is positive then employee is one of two types:
✔️ RockStar - people with gradual growth, they value stability, knowledge sharing, attention to the details.
✔️ SuperStar - people with steep growth, they quickly acquire new skills or rapidly develop existing ones. Superstars constantly increase their set of responsibilities and influence within the team.

If performance is negative, then it's the leader's job to understand what's wrong (you can check how to do that in Why Don’t People Do What You Expect post). In some cases the best solution is allow the person to leave the team or company.

One really important thought that I hadn't considered before is that these labels (rockstar, superstar, low performer) are not permanent and can change over time. For example, when I started to work in IT company right after university, I spent a lot of efforts to my career growth. Then, I had a child, my priorities changed. I needed stability, flexible schedule and enough free time for the family. Later, when my kid grew older, I was ready to take new ambitious and challenging tasks again.

That's why it's so important to know your team and their current life priorities. A leader's job isn't about giving the meaning for the work but to understand what meaning each employee seeks in it. Only in that case you can help people to grow.

#booknook #softskills #leadership #communications

Читать полностью…

TechLead Bits

Radical Candor. Part 1: Build Relationships.

There is nothing more important then listening to your people. It's not just additional responsibilities to the leader's job, it is the leader's job.

Trust relationships in the team are based on the following principles:
✏️ Care Personally: Show real interest to the people you work with, offer help where it's needed, support their growth. Remember, people have life outside the work, and this life can significantly impact their work.
✏️ Challenge Directly: Be demanding of yourself and others in a constructive way. Provide a feedback even it's negative. Stay focused on the issue, not the person.  Encourage high standards and accountability to show your commitment to the results.

Care Personally + Challenge Directly = Radical Candor 


Using these 2 dimensions the feedback can be:
✏️ Ruinous Empathy. The most frequent situation: avoiding tough conversations out of fear to hurt someone's feelings. It leads to cultures where nothing gets done, critical information is lost because people don’t speak up.
✏️ Manipulative Insincerity. Lacking both care and directness. It's usually an attempt to get the profit using emotional state of another person. Such interactions leave unpleasant aftertaste. Staff no longer cares about each other or the team's results. This is cultural bankruptcy.
✏️ Obnoxious Aggression. Criticism without care. The negative experience that person goes through usually ends up rejecting the message you were trying to explain. It leads to toxic work cultures.
✏️ Radical Candor. Challenging and direct feedback with empathy. Criticism is provided with the care and a wish to help. Don't hide problems - solve them together.

It's very important to follow radical candor principles to built health team culture.

#booknook #softskills #leadership #communications

Читать полностью…

TechLead Bits

SLMs evolution timeline from SLMs: survey, measurements, and insights publication.

From the latest news Microsoft already released SLM Phi-4 model (Dec 2024). So SLM domain is under active development.

#aibasics #news

Читать полностью…

TechLead Bits

Build Abstractions not Illusions

Creating abstractions is a key part of software development. We use abstractions at every level, whether it’s defining class hierarchies, setting component boundaries, or breaking down business processes. But how can we understand if our abstractions are good enough? That's the main topic of the Gregor Hohpe talk Build Abstractions not Illusions.

Key points from the talk:
✏️ Abstractions are needed to hide implementation details and reduce cognitive load for teams
✏️ Abstraction is the foundation of any model, models help us make better architecture decisions
✏️ Abstractions provide a higher-level vocabulary that hides underlying complexity. Abstractions should not be a composition of other elements (e.g., EngineGearWheelsAssembly instead of "car")
✏️ If an abstraction hides essential details, it becomes an illusion. Illusions are dangerous because they mislead users about the real system structure or behavior
✏️ Too much detail means it’s a composition; too little detail makes it an illusion. The right level of abstraction is a "gold" balance between these two.

The video suggests a really good model to think about abstractions in a scale composition - abstraction - illusion. A practical tip to check your abstractions is to ask: What details are essential? Can they affect the correctness of the model? If these details are missing, could that lead to misunderstandings about the system's properties?

#architecture

Читать полностью…

TechLead Bits

Word Embeddings

As we discussed embeddings can be different depending on the task, but one common task is predicting the context of a word. This is the foundation of how large language models (LLMs) work. LLMs don’t actually understand human language; instead, they understand the numerical relationships between words.

Key properties of a good word embedding:
✔️ Similar words should have similar vector scores.
✔️ Different words should have vector scores with values that are far away from each other.

When each word or data point has a single embedding vector, this is called a static embedding. Static embeddings, once trained, contain some semantic information, especially in how words relate to each other. It means that word similarity depends on the data the model was trained on.

To address the limitations of static embeddings, contextual embeddings were developed. Contextual embeddings allow a word to have multiple representations based on its surrounding context. For example, the word orange can mean a fruit or a color, so it would have a different embedding for each meaning depending on the context.

Tensorflow offers a great tool called Projector where you can play with `word2vec` static embedding model and see how trained data impacts words relationships.

Embeddings can be used for words, sentences and even documents. Interesting feature that is built on top of that - context search. Instead of searching by a particular word, you can search by the meaning of that word and find a related document even if there are no exact matches.

#aibasics

Читать полностью…

TechLead Bits

Balancing Coupling

Today we'll talk about Balancing Coupling in Software Design book by Vlad Khononov. That's a quite fresh book (2024) that addresses a common architecture problem - how to balance coupling between components to make it easy to support new features and technologies without turning the solution into a big ball of mud.

The author defines coupling as a relationship between connected entities. If entities are coupled, they can affect each other. As a result, coupled entities should be changed together.

Main reasons for change:
- Shared Lifecycle: build, test, deployment
- Shared Knowledge: model, implementation details, order of execution, etc.

The author defines 4 levels of coupling:
📍Contract coupling. Modules communicate through an integration-specific contract.
📍 Model coupling. The same model of the business domain is used by multiple modules.
📍Functional coupling. Share the knowledge of the functionality: the sequence of steps to do, sharing the same transaction, logic duplication.
📍Intrusive coupling. Integration through component implementation details that were not intended for integration.

Coupling can be described by the following dimensions:
📍Connascence. Shared lifecycle levels: static - compilation time or dynamic - runtime dependencies.
📍Integration Strength. The more knowledge components share, the stronger the integration is between them .
📍Distance. The physical distance between components: the same class, the same package, the same lib, etc. The greater the distance is, the more effort is needed to introduce a cascading change.
📍Volatility. How frequently the module is changed.

Then the author suggests a model to calculate coupling and other architecture characteristics using values of these dimensions.

For example,

Changes Cost = Volatility AND Distance 

It means that if both distance and volatility are high, the actual cost of changes is high.

Coupling balance equation:
Balance = (Strength XOR Distance) OR NOT Volatility


Of course, the scale is relative and quite subjective but it allows you to have a framework to assess your architectural decisions, predict their consequences, and adjust solution characteristics to find the right balance.

Overall book impression is very positive: it has no fluff, it's clear, structured and very practical. Definitely recommend.

#booknook #engineering

Читать полностью…

TechLead Bits

Really nice illustration from "Thinking Like an Architect" that shows what it means to see more 👍

Читать полностью…

TechLead Bits

DR Strategies. My attempt to visualize main ideas 🙂

#architecture #systemdesign

Читать полностью…

TechLead Bits

RPO and RTO concepts visualization

Source: https://aws.amazon.com/blogs/mt/establishing-rpo-and-rto-targets-for-cloud-applications/

Читать полностью…

TechLead Bits

DR: Main Concepts

Last months I've been working a lot on Disaster Recovery topics, so I decided to summarize key points and patterns for that.

Disaster recovery (DR) is an ability to restore access and functionality of IT services after a disaster event, whether it's natural or caused by a human action (or error).

DR is usually designed in terms of Availability Zones and Regions:
- Availability Zone (AZ) – minimal and atomic unit of geo-redundancy. It can be represented by the whole Data Center (physical building) or smaller parts like isolated rack, floor, or hypervisor.
- Region - a set of Availability Zones within a single geographic area.

The most popular setups:
✏️ Public clouds. AZ is represented as a separate datacenter, datacenters are located within ~100 km of each other. The chance that all datacenters will be broken at the same time is very low. So it's enough to distribute a workload across multiple AZ. Different regions may still make sense but mostly for load and content distribution.
✏️ On-premise clouds. In that case AZ is usually represented by different floors or racks in the same building. In that case it's better to have at least 2 regions to cover DR cases.

DR approach is measured by:
✏️ Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of a service and restoration of service. It's how long your service is not available.
✏️ Recovery Point Objective (RPO) is the maximum acceptable amount of time since the last data recovery point (e.g. backup). It's how much data you can loose in case of failure.

Disaster Recovery architecture is driven by requirements to RTO and RPO values for particular application. It's the first thing you should define before implementing any solution. In one of the next posts we'll check DR implementation strategies.

#architecture #systemdesign

Читать полностью…

TechLead Bits

Shift Security Down

Last week CNCF Kubernetes Policy Working Group released a Security "Shift Down" whitepaper. The main idea is to shift the security focus down to the platform layer.

By embedding security directly into the Kubernetes platform, rather than adding it as afterthought, we empower developers, operators, and security teams strengthening the software supply chain, simplifying compliance, and building more resilient and secure cloud-native environments.

said Poonam Lamba, co-chair of the CNCF Kubernetes Policy Working Group and a Product Manager at Google Cloud.

While Shift-Left Security emphasizes developer responsibility for security, Shift-Down Security focuses on integrating security directly into the platform, providing an environment that is secured by default.

Key elements of the Shift-Down Strategy:
✏️ Common security concerns are handled on the platform level rather then by business applications
✏️ Security is codified, automated, and managed as a code
✏️ Platform security complements Shift-Left approach and existing processes

The whitepaper provides a shared responsibility model across developers, operations, and security teams, introduces common patterns for managing vulnerabilities and misconfigurations, promotes automation and simplification, enforces security best practices on the platform layer.

#engineering #security #news

Читать полностью…

TechLead Bits

Adopting ARM at Scale

Some time ago I wrote about infrastructure cost savings using Multi-Arch Images and the growing ARM-based competition between big cloud providers.

Interestingly, just last week, Uber published an article about their big migration from on-premise data centers to Oracle Cloud and Google Cloud platforms, integrating Arm-based servers for cost efficiency. The main challenge is to migrate existing infrastructure and around 5000 services on multi-arch approach.

Uber team defined the following migration steps:
- Host Readiness. Ensure that host-level software is compatible with Arm.
- Build Readiness. Update build systems to support multi-arch images.
- Platform Readiness. Deployment system changes.
- SKU Qualification. Assess hardware reliability and performance.
- Workload Readiness. Migrate code repositories and container images to support Arm.
- Adoption Readiness. Test workloads on Arm architecture.
- Adoption. The final rollout. The team built an additional safety mechanism that reverts back to x86 if a service is deployed with a single-architecture image.

The migration is not fully finished yet, but the first services are already successfully built, scheduled, and running on Arm-based hosts. Looks like a really good achievement in migrating huge infrastructure.
 
#engineering #news

Читать полностью…

TechLead Bits

Be Curious

To make the right decisions, technical leaders and architects need to understand not only the technical part of the project, but its existing limitations, business, integrations, legal and contractual restrictions. I call that the project context.

And I usually recommend to extend this context understanding for all my mentees and more junior colleagues. As a response, they often ask: "Where can I read about this?" First times I was really confused by this question. The problem is that there's no single document that describes absolutely everything about a project. Information is often fragmented, distributed across documents and teams, a lot of things are not described at all.

So what to do and how to extend the overall project knowledge? I spent some time to reflect on that, the answer is simple—be curious. Ask questions, request specific documents, talk to people, and be interested in what’s happening around you. You can start with your manager, neighbor teams, colleagues from other departments.

Over time, it will help you to build a wide picture of the project, improve business understanding, perform better trade-offs analysis and choose more efficient technical solutions.

#softskills #architecture

Читать полностью…

TechLead Bits

Vector Databases

AI revolution makes vector databases very popular. They are now a fundamental block for building GenAI systems. So let's check what they are and how they are different from traditional databases.

Vector database is a database designed to store, manage and index high-dimensional vector data (check Data Vectorization, Embeddings and Word Embeddings for more details). Unlike relational databases this type of database works with unstructured data (embeddings for social media, images, audio, etc.) and can provide search result based on data similarity instead of exact match (e.g., return tomatoes, potatoes and cucumbers on vegetables search request).

Working pipeline consists of 3 steps:

✏️ Indexing. As with other data types, efficiently querying a large set of vectors requires an index. Main algorithms:
- Locality-Sensitive Hashing (LSH) – Uses hashing to group similar vectors.
- Quantization – Compresses data to speed up searches. Compression can loose some initial data but it keeps the info that is is vital for similarity operations.
- Graph-Based Algorithms – Uses nodes to represent vectors. It clusters the nodes and draws lines or edges between similar nodes, creating hierarchical graphs. When a query is launched, the algorithm will navigate the graph hierarchy to find nodes containing the vectors that are most similar to the query vector.

✏️ Search. The system compares the query vector to indexed vectors to find the closest matches:
- Exact Nearest Neighbor - measuring the absolute distance between all points in the vector, it's accurate, slow, requires a lot of computational resources
- Approximate Nearest Neighbor (ANN) - allows to return points whose distance is at most c times the distance from the query to its nearest points. It's cheaper and faster then exact match approach, modern vector databases use ANN algorithms.

✏️ Post (or Pre)-processing. Additional steps may be applied to the search results: metadata filtering, re-ranking based on different similarity measures to improve accuracy.

Popular opensource implementations:
- Opensearch
- Milvus
- Qdrant
- Neo4j
- Pgvector (based on Postgres)

Using LLMs without a vector database can be slow and inefficient because the model must process the full context every time. A vector database optimizes this by storing precomputed embeddings, enabling fast, efficient searches without repeatedly running the entire dataset through the model.

#engineering #aibasics

Читать полностью…

TechLead Bits

Radical Candor. Part 3: Get Stuff Done

The final part of Radical Candor book overview is about making decisions (other parts: Build Relationships, Managed Growth).

The author suggests a framework to make decisions and support health culture within the team - Get Stuff Done Wheel. The wheel contains the following steps:
✏️ Listen.  You have to listen to the ideas that people have and create a culture where they listen to each other.
✏️ Clarify. You have to create space in which ideas can be clarified, to make sure these ideas don’t get crushed before everyone fully understands their potential usefulness.
✏️ Debate. You have to debate ideas and test them more rigorously.
✏️ Decide. Make a decision. Sometimes it's better to split debate and making decisions on 2 separate sessions.
✏️ Persuade. You have to persuade those who weren’t involved in a decision (stakeholders) that it was a good one so that everyone can execute it effectively.
✏️ Execute. Implement the decision
✏️ Learn. You have to learn from the results, whether or not you did the right thing, and start the whole process over again.

That’s a lot of steps but they should be quick. Not skipping a step and not getting stuck on one are equally important.

Of course, the book contains many more useful techniques and tools to work with people. I would say it’s the top book I read last year from soft skills area. I recommend it to everyone because a good feedback is the foundation of trust relationships in a team—both from the leader and from all team members sides.

#booknook #softskills #leadership #communications

Читать полностью…

TechLead Bits

My illustrations may be not ideal, but the concept is very important, so there is a more classic matrix representation from the book

Читать полностью…

TechLead Bits

Radical Candor

One of the most useful and practical books that I read last year - Radical Candor: Be a Kick-Ass Boss Without Losing Your Humanity by Kim Scott.

Kim Scott is really experienced manager: she led teams in Google, taught general management at Apple University, coached CEOs at Dropbox, Qualtrics, Twitter and other tech companies. So she knows what she is talking about.

Let's start from a sketchnote that I prepared for the book to memorize key ideas. In next posts I'll explain them in more details.

#booknook #softskills #leadership #communications #sketchnote

Читать полностью…

TechLead Bits

Small Language Models

While large language models (LLMs) are booming and discussed everywhere, there’s another important trend in the AI world—Small Language Models (SLMs).

A small language model (SLM) is a language model similar to a large language model but with a significantly reduced number of parameters. SLMs typically range from a few million to a few billion parameters, while LLMs have hundreds of billions or even trillions. For example, GPT-3 has 175 billion parameters, whereas Microsoft’s Phi-2, an SLM, has just 2 billion.

Main techniques to train SLMs:
✏️ Knowledge Distillation. A smaller model (the "student") learns from a bigger, already-trained model (the "teacher"). The student model is trained to not only match the teacher model’s predictions but also mimic its underlying process of reasoning. Typically the teacher model’s weights are frozen and cannot be changed during the distillation process.
✏️ Pruning. This is a process of getting rid of the extra bits that aren't really needed, making it smaller and faster without loosing too much accuracy.

Key advantages:
✔️ Resource Efficiency. SLMs are more compact and require fewer resources that makes them suitable for deployment on small devices.
✔️Cost-Effectiveness. They are much cheaper to train and deploy compared to LLMs.
✔️ Customization. SLMs can be fine-tuned on specific datasets, making them highly efficient for specialized tasks in particular industries.
✔️ Security. SLMs can be deployed locally or in private cloud environments, keeping sensitive data under organizational control.

There’s no one-size-fits-all solution when it comes to AI adoption. Every business will focus on efficiency and select the best and most cost-effective tool to get the job done properly. Architects should carefully select the right-sized model for each project based on its goals and constraints.

#aibasics

Читать полностью…

TechLead Bits

Large Language Models

In the previous post we learned how words can be transformed into feature vectors. However, language models don’t actually work with full words—they work with tokens. A token is the smallest unit of the model that can be a word, subword or even a single character (e.g., punctuation symbols).

For example, the word unpredictable can be broken into the following tokens:
- un (prefix)
- predict (root)
- able (suffix)

Language model is the model that estimates probability of a token or sequence of tokens appear within a longer sequence of tokens.

Example:
When I come home after work, I _________ in the kitchen.
Possible predictions:
- "cook dinner"
- "drink tea"
- "watch TV"

Each option has a different probability, and the model’s task is to pick the tokens with the highest probability to complete the sentence.

One more important element of the LLM is context. Context is helpful information before or after the target token. For example, context can help determine whether the word "orange" refers to a fruit or a color.

When a language model has a large number of parameters (billions of parameters), it’s called a large language model (LLM). LLMs are powerful enough to predict entire paragraphs or even entire essays.

Extending the example above it would be like:
User's question: What can I do in the kitchen after work?
LLM Response: _____

There is no magic - just math, probabilities, and predictions. So if you hear that LLM makes the scientific reasoning about some problem, it's not real reasoning, it's just advanced autocomplete for a given prompt.

#aibasics

Читать полностью…

TechLead Bits

Inspired: How to Create Tech Products Customers Love

"Developing a product mindset is essential for any engineer, and it becomes mandatory when aiming for an engineering leadership role." I read that in Hybrid Hacker newsletter about engineering leader roadmap some time ago.

So I decided to improve my product mindset and understand what product managers really do 😉 with one of the most popular book in that area - Inspired: How to Create Tech Products Customers Love by Marty Cagan.

Key thoughts from the book:
✏️ Product teams should have enough autonomy and wide set of responsibilities. Teams should have the ownership of the product they develop.
✏️ Product should have an inspiring vision. Vision is what business wants to achieve.
✏️ Product should have a strategy. A strategy describes how vision will be achieved.
✏️ Product teams should have correct business context to make right decisions.
✏️ Teams should use fast and cheap prototypes to test business ideas.
✏️ If the company uses OKRs, then they should be set for teams not for individuals.
✏️ Сontinuous product research and innovation are the foundation for the business success.

Principles for a good product vision:
✔️ Always start with Why. Define the main product goal.
✔️ Focus on the problem not the solution.
✔️ Be ambitious.
✔️ Be ready to break something old to build something new.
✔️ Be inspirational. Your vision should excite people, they should want to be a part of the product.
✔️ Be aligned with current industry trends.
✔️ Focus on the future.
✔️ Allow flexibility in details.
✔️ Require extra efforts to achieve. If it's easy to achieve the vision, it may be not ambition enough.
✔️ Evangelism. Regularly explain product vision to the teams and stakeholders.

Principles for a good product strategy:
✔️ Focus on one user profile or business niche at a time.
✔️ Align with the overall company's strategy.
✔️ Align with the marketing strategy.
✔️ Focus on customers not competitors
✔️ Share the strategy with the teams.

The principles and recommendations about product vision and strategy were the most valuable parts of the book for me. Other chapters felt a bit trivial - like using prototypes, making them cheap and fast, not doing real product from MVP, knowing your customers, etc. Anyway, the book is really good to improve understanding of product management.

#booknook #product #leadership

Читать полностью…
Subscribe to a channel