IQS | Enterprise Microservices Architecture

The hidden cost nobody mentions upfront

A well-built monolith needs: one deploy process, one database, one logging system, one monitoring system. The same functionality in 10 microservices needs: 10 deploy pipelines, 10 separate databases or schemas, log correlation across 10 services, 10 sets of alerts, and a communication layer between all of them that in the monolith was simply a function call.

Amazon, Netflix, and Spotify built microservices when they had hundreds of engineers and the monolith was the actual bottleneck. The trap is trying to replicate their architecture with a 5-10 person team where operational overhead eliminates most development time.

When microservices DO make sense

Multiple teams working on the same product, with well-defined domains and different deployment cadences. Microservices let the payments team deploy independently from the catalog team.
Components with fundamentally different technical requirements: the video processing service needs GPU and scales to thousands of ephemeral instances; the user management service is lightweight and constant.
Differential scaling: the search engine needs to scale to 100x during high-traffic events; the account configuration module handles 10 requests per hour.
The monolith has real scaling or deployment problems causing incidents — not projected problems used to justify complexity from the start.

Domain-Driven Design: the foundation of correct design

The most common mistake when decomposing a monolith into microservices: defining services by technical layers (user-service, data-service, email-service) instead of business domains. Layer-defined services generate high coupling: to add a checkout feature, you need to modify user-service, product-service, order-service, and payment-service simultaneously.

text

# Bounded Contexts identified by business domain
┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────┐
│   Identity Context  │  │   Catalog Context   │  │   Orders Context    │
│   ─────────────────  │  │   ─────────────────  │  │   ─────────────────  │
│   User              │  │   Product           │  │   Order             │
│   Session           │  │   Category          │  │   LineItem          │
│   Permission        │  │   Inventory         │  │   Applied price     │
└─────────────────────┘  └─────────────────────┘  └─────────────────────┘
         │                         │                        │
         └──────── Event Bus (Kafka/RabbitMQ) ─────────────┘
# Services do NOT share databases
# Communicate via events for async operations
# Communicate via API for sync queries

Sync vs. async communication: the most important decision

Synchronous communication (REST/gRPC): one service waits for the other's response before continuing. Easier to debug, immediate consistency, but creates temporal coupling — if the destination service is down, the source service also fails.

Asynchronous communication (events via Kafka/RabbitMQ): the emitter publishes an event and continues without waiting. The consumer processes the event in its own time. More resilient, allows scaling consumers independently, but debugging is more complex and consistency is eventual.

typescript

// Outbox pattern for event delivery guarantee
async function createOrder(data: CreateOrderDto, trx: Transaction) {
  // 1. Create the order in the database
  const order = await Order.create(data, { transaction: trx });

  // 2. Persist the event in the outbox table (same transaction)
  await OutboxEvent.create({
    aggregate_type: 'Order',
    aggregate_id: order.id,
    event_type: 'OrderCreated',
    payload: JSON.stringify(order.toJSON()),
  }, { transaction: trx });

  // If the transaction commits, the event is guaranteed
  // A separate worker reads the outbox table and publishes to Kafka
}

Publishing directly to Kafka inside a transaction is a classic mistake: if Kafka is down, the transaction fails and data isn't persisted. If the DB fails after Kafka commit, the event is in Kafka but the operation didn't complete. The Outbox pattern is the correct solution.

Service Mesh: when to add Istio or Linkerd

A service mesh adds a proxy layer between all microservices managing: mTLS (encryption and authentication between services), traffic management (retries, circuit breaking, load balancing), and observability (latency and error metrics per service pair). The decision of when to add it: necessary when you have more than 5-6 microservices and security requirements (in-transit encryption, service-to-service authentication) or traffic management complexity justify the operational overhead.

The nano-microservice anti-pattern

The most destructive anti-pattern: decomposing down to the function level. A service that only exposes 2-3 endpoints with trivial logic doesn't bring the benefits of microservices — it only adds network latency, deployment complexity, and one more service to monitor. The practical rule: if a service can't justify its own CI/CD pipeline and its own database or schema, it's probably too small.

Frequently Asked Questions

How do I migrate from a monolith to microservices without stopping the business?

The Strangler Fig Pattern applied to microservices: identify the first module to extract (typically the one with the highest differential scale or that blocks the most teams), extract it to a separate service with a well-defined API, gradually redirect traffic while the monolith continues as fallback. The key is keeping a shared database initially and migrating it to the new service in a second phase.

Microservices or modular monolith for a startup?

Modular monolith almost always for a startup. A well-modularized monolith (modules with clean interfaces, no direct coupling between modules) can evolve to microservices when the team and scale justify it — extracting modules with already well-defined boundaries. Day-one microservices in a startup is premature optimization that consumes engineering time that should go to the product.

How do I handle transactions that span multiple microservices?

Distributed transactions (2PC) are extremely difficult to implement correctly in microservices. The recommended pattern is Saga: each service executes its local transaction and publishes an event. If any step fails, previous services execute compensating transactions to revert. This requires all operations to have a defined inverse operation.

How do I do integration testing between microservices?

Contract testing (Pact) to verify that API consumers and providers are compatible without needing a full integration environment. E2E tests in a staging environment with all services deployed for critical flows. Integration tests that spin up all microservices locally are slow and brittle — contract testing is more pragmatic.

What database does each microservice use?

Each microservice should have its own datastore — the Database per Service principle. The database type can vary: catalog service with relational PostgreSQL, search service with Elasticsearch, sessions service with Redis, events service with Kafka. Shared databases between microservices is the most destructive anti-pattern because it creates schema-level coupling that makes independent deployment impossible.

Is your company evaluating migrating to microservices? We can do the architectural assessment and design the right migration strategy for your context.

Talk to our team

AI · RAG

Microservices Architecture: When It Makes Sense and How to Do It Right

The hidden cost nobody mentions upfront

When microservices DO make sense

Domain-Driven Design: the foundation of correct design

Sync vs. async communication: the most important decision

Service Mesh: when to add Istio or Linkerd

The nano-microservice anti-pattern

Frequently Asked Questions

Related articles

How to Build a RAG System: AI Over Your Own Data

Platform Engineering: How to Build an Internal Developer Platform (IDP)

AWS vs Azure vs GCP in the Dominican Republic: Costs, Capabilities, and Which to Choose