Blog Connection

Rate limiting: Why it’s used and how it works

With increasingly more people online, it's essential for websites and applications to have security systems that operate automatically behind the scenes. Tools like rate limiting help defend against threats such as bot abuse, excessive user requests, and coordinated attacks designed to overwhelm a site. In this guide, you'll learn how rate limiting works and how it helps protect a website from attacks.

Marijus Briedis

April 25, 2025

11 min read

What is rate limiting?

Rate limiting is a security net that puts limits on network traffic to control the flow of requests for a specific site. Put simply, it limits the number of requests a user or client can make to a particular endpoint and throttles or blocks it when necessary. This net is important because it prevents abuse of a site. As a security tool, rate limiting prevents malicious activity and protects a network.

Rate limiters typically help prevent operational costs from being exceeded, avoid resource starvation, and secure API from malicious users. Rate limits may be implemented on the client side, server side, or as middleware.

Why is rate limiting important?

Rate limiting protects both sites and users by controlling the number of requests and slowing them down to prevent abuse and maintain application security and stability. Let’s take a look at how rate limiting protects various parties.

Prevents abuse and overloading

Rate limiting offers several security benefits that help protect systems from various forms of abuse and overload, including:

DoS/DDoS attacks. Strategies such as rate limiting are common tactics to protect networks from denial-of-service attacks. These attacks are engineered to overwhelm a network or server via requests at a high volume. Such requests disrupt the service, making it unavailable for legitimate users. Limiting the rate of requests makes it harder for attackers to execute a DoS attack and minimizes or prevents brute-force attacks.
Bot attacks. Malicious actors may use bots to perform repetitive tasks. Bot attacks are typically employed by hackers who want to spread malware or scrape information. With tools like rate limiting, networks can reduce the number of requests to protect themselves from being overwhelmed. Other security measures, such as CAPTCHA, allow users to log in to sites while preventing access to bots.
API abuse. A key feature in rate limiting is to block API abuse by creating steps for authorization and authentication. The user's identity or the system making the API request is verified, which ensures they have the required permissions to access specific resources.

Protects against cyberattacks

Rate limiting also plays a vital role in defending against common cyberattacks by restricting malicious activity before it causes harm, such as:

Brute-force attacks/Credential stuffing

Brute-force attacks are attacks that don’t have a list of user credentials. They use bots to systematically generate and input credentials until a set of credentials works. Once in, they may steal as much information as they can to use later on.

Data scraping and theft

Hackers usually target websites to harvest information and sell it or cut out competitors. Attackers target information, such as pricing information, from an ecommerce company and then use that to their advantage. Rate limiting also prevents data scraping.

Ensures system stability and performance

Rate limiting solutions allow a network or server to safely process requests at a high volume by ensuring they don’t exceed capacity. This feature ensures performance is consistent and resources are distributed fairly.

Improves user experience

Tools such as rate limiting can help improve user experience. By preventing overload, it helps maintain consistent response times.

Rate limiting also helps reduce costs by blocking the overuse of resources. When a resource experiences a high volume of requests, it may require a higher capacity to handle them, which may result in additional costs. Organizations can reduce this occurrence by rate limiting service requests.

What does it mean when you are rate limited?

When users encounter a rate limit, they will not be allowed to repeat an action within a specific time frame. For example, if a user has requested to log in too many times within a specific period, rate limiting solutions will block the requests and not allow it to happen. It will block the requests for an allotted amount of time and allow them through when enough time has passed.

How does rate limiting work?

Rate limiting more commonly works by tracking the number of requests made from a single IP address within a fixed window of time rather than measuring the time between individual requests. It then tracks how many requests are made in a specific time frame. If an IP address submits too many requests within a period of time, the rate limiting solution suppresses that IP address and will reject further requests for the following time frame.

Rate limiting types

Teams in charge of a network's security can control the methods and parameters when setting up a rate limit. An organization can choose specific rate limit techniques to provide the right level of restriction. Let’s take a look at three common approaches to rate limiting:

Geographical rate limits. An organization can set up parameters for specific geographic regions by creating a rate limit for certain areas during particular time frames. For example, if a company believes a specific area will be less active during nighttime hours, it can set a lower rate limit during this time. Geographical rate limits can help slow down traffic, which may reduce malicious actors or attacks.
User rate limits. As the most popular type of rate limiting, user rate limits track the number of requests from a specific user, typically using their API key or user account. These limits may also track the IP address to prevent excessive requests from the same network.
Server rate limits. Organizations can also set a rate limit at the server level. This feature is often viewed as offering flexibility, allowing developers to increase rate limits on popular servers while decreasing traffic on less active ones.

Rate limiting strategies

You can use various rate limiting strategies to protect sites from bot attacks. Common strategies include:

Fixed window

Some rate limiting tools may utilize a fixed window algorithm to track requests and throttle them by separating them into fixed intervals or windows. This algorithm counts each request within each window, and if the number is over a predetermined limit, further requests are halted until the next window.

For example, an organization’s server’s rate limiting strategy might employ an algorithm that accepts up to 300 API requests per minute. A fixed timeframe starts at a set timeframe, and the server will not serve more than 300 requests between 8:00 and 8:01, but the fixed window will reset, which allows another 300 requests until 8:02.

Sliding window

A sliding window algorithm is a rate limiting tool that also tracks requests and slows them down by separating requests into a sequence of overlapping windows to count the requests made within each window.

The security tool works by tracking the number of requests made by each client within a certain period, using a fixed size window. The window size designates the maximum number of requests a client can make within that time frame, and the window slides forward as time goes on, removing old request counts and permitting new ones to be recorded.

Token bucket

A token bucket algorithm is a popular strategy often used by rate limiting solutions to track and suppress requests. This type of algorithm uses a bucket to store a predetermined number of tokens, each representing a request that a user can make.

When new requests are made, the tokens are taken from the bucket. When the bucket is empty, the requests are throttled until new tokens are available. Controlling when tokens are added to the bucket also controls the rate of requests.

Many developers find the token bucket algorithm advantageous because it’s memory efficient and requires a set number of tokens to be stored in memory.

Leaky bucket

A leaky bucket algorithm is similar to a token bucket algorithm, but it stores a set amount of data instead of tokens. When requests come in, the data is removed from the bucket. If a bucket becomes empty, requests are queued until data becomes available. Controlling which data is added to the bucket also controls the rate of requests.

The leaky bucket algorithm is considered one of the easiest to implement. Rate limiting solutions allow a set amount of data to transmit at consistent rates, which is beneficial for applications requiring a steady stream of data.

While the leaky bucket algorithm is considered a simple rate limiting tool, it’s also known to be less accurate than other algorithms. It may be less effective at tracking excess requests and enforcing rate limits because it relies on a fixed rate of data transmission instead of a fixed number of requests.

Sliding log

The sliding log rate limiting technique creates a log of every client request within a specific time frame using a fixed-sized sliding window. It’s helpful for advanced rate limiting situations, such as when a developer needs to differentiate different types of clients or create complex rules for limiting requests.

A sliding log algorithm requires more resources than other solutions because it needs the server to store a larger, more detailed log of requests.

What are examples of rate limiting?

Users may experience rate limiting on popular social media applications or sites, including:

Discord. Popular applications like Discord experience a high volume of user requests. To control the flow of requests, it uses rate limiting solutions to prevent service overload, spam, or abuse. A user may be rate limited if they send spam messages too quickly.
X (formerly Twitter). Social media platforms like X generally use API rate limiting tools similar to rate limiting for websites. X allows third-party applications to integrate but only allows new posts and messages to refresh a specific number of times per hour.
Snapchat. This application utilizes multiple rate limiting solutions to ensure stability. Rate limiting is employed across multiple user actions and at token level. Users may experience rate limiting if they submit too many requests for actions such as sending snaps, adding friends, or logging in.
ChatGPT. ChatGPT, a product of OpenAI, has rate limiting solutions to ensure stability. The application sets rate limits, especially for users on the free tier, on the number of prompts allowed within a given time window.

How to avoid being rate limited accidentally

Many individuals experience rate limiting by accident. Here are the best strategies to avoid rate limiting on websites and applications:

1.Space out requests to prevent triggering a set rate limit.
2.Use APIs efficiently.
3.Check platforms for specific rate limit policies.
4.Avoid behaviors similar to bots when using social media platforms.

Online security starts with a click.

Stay safe with the world’s leading VPN

Get NordVPN Learn more

FAQ

Get a dedicated IP only you can use

Access remote work servers, skip CAPTCHAs, and avoid restrictions due to denylisting.

Get a Dedicated IP