Java Spring Boot 1M req/sec

How I Optimized a Spring Boot Application to Handle 1M Requests/Second

Scaling a Spring Boot application to handle 1 million requests per second might sound like an impossible feat, but with the right…

Scaling a Spring Boot application to handle 1 million requests per second might sound like an impossible feat, but with the right strategies, it’s absolutely achievable. Here’s how I did it:

1️⃣ Understand Your Bottlenecks

Before diving into optimization, I conducted a thorough performance analysis using tools like JProfiler and New Relic. This helped identify the critical bottlenecks:

  • High response times for certain APIs
  • Slow database queries
  • Thread contention in critical parts of the application

💡 Lesson Learned: Always measure before optimizing. Guesswork can lead to wasted effort.

2️⃣ Implement Reactive Programming

Switching to Spring WebFlux for critical parts of the application enabled a non-blocking, reactive architecture. This significantly reduced thread usage, allowing the server to handle more concurrent requests.

3️⃣ Optimize Database Queries

Database performance was a huge bottleneck. Here’s what worked:

  • Query Optimization: Rewrote complex queries, added proper indexes, and avoided N+1 queries using Hibernate’s @BatchSize.
  • Caching: Leveraged Redis to cache frequently accessed data, cutting down repetitive database hits.
  • Connection Pooling: Tuned HikariCP settings to efficiently handle high traffic.

4️⃣ Tune Thread Pool and Connection Limits

Fine-tuning thread pools and connection limits in Tomcat and Netty(used by WebFlux) was a game changer:

  • Used spring.task.execution.poolsettings for async tasks.
  • Increased Netty’s connection limits and optimized worker threads.

5️⃣ Leverage CDN and Load Balancers

To distribute the load, I:

  • Integrated a CDN (like Cloudflare) to cache static assets.
  • Used a load balancer (NGINX + AWS ALB) to distribute traffic across multiple app instances.

6️⃣ Optimize Serialization, Compression, and Caching

Switching to Kryo serialization for data transfer and enabling GZIP compression for responses significantly reduced payload sizes and improved response times. Additionally, strategic use of caching for intermediate computations and temporary data further enhanced performance.

7️⃣ Adopt Horizontal Scaling

I deployed the app in a containerized environment using Kubernetes:

  • Added autoscaling rules to spin up more pods during traffic surges.
  • Used Istio for traffic shaping and resilience.

8️⃣ Gatling and Apache JMeter

Using tools like Gatling and Apache JMeter, I simulated real-world traffic. Stress testing helped identify weak spots before deploying to production.

🌟 The Result

With these optimizations, our Spring Boot application went from struggling under 100K requests/second to consistently handling 1M requests/second with low latency and high reliability.

Key Takeaway

Performance optimization is not about finding one magic solution — it’s a combination of small, targeted improvements that align with your specific bottlenecks. By measuring, iterating, and testing thoroughly, even the most ambitious scalability goals can be achieved.

Leave a Reply