Bridges

IBC Rate Limits: Introduction and State of the Art (1/3)

This is part 1 of our research report on bridge security, IBC, and rate limits as safety mechanisms. Built in collaboration with Osmosis.

RangeTeam · January 3, 2024

IBC Rate Limits: Introduction and State of the Art (1/3)

Introduction
State of the Art
Osmosis IBC Rate Limits (2/3)
Extending IBC Rate Limits (3/3)

1. Introduction

Bridge security is one of the key unsolved problems in blockchain infrastructure. Over $2.5B has been lost in bridge exploits over the last 18 months.

There are multiple reasons why bridge protocols are the target of security exploits:

Bridges handle large TVLs. Due to the architecture of most bridges, assets in the source chain are escrowed (or locked), after which synthetic versions of those assets are minted in the destination chain. These designs translate into large amounts of tokens being escrowed for long periods of time.
Complex codebases. Bridges function is to communicate and relay messages and assets between multiple blockchains. As such, bridge protocols need to understand and be able to verify multiple protocols at the same, sometimes using different programming languages for each of their “endpoints”, which produces complex software systems. Complexity in software is correlated with larger attack surfaces and more vulnerable code.
Centralized designs. Developing trust-minimized cross-chain protocols is hard. Thus, many projects have taken shortcuts with more centralized designs. Relying on multisigs and trusted setups opens a whole new set of attack vectors, such as key compromise, as we saw in the Ronin bridge exploit (>$600M) or Harmony bridge.

Since the launch of IBC 2 years ago, the Cosmos ecosystem has been able to leverage a trust-minimized arbitrary message-passing protocol, which has not suffered any major security incidents.

As mentioned above, centralized bridges are weaker designs in most cases, but historically, most of the bridge exploits have been due to software bugs and implementation errors and not to the exploitation of a weakness at the protocol level. Examples of this are the Wormhole exploit (>$300M) and Nomad ($190M). These examples show that no matter how sound a protocol is, there can always be implementation bugs, and trust-minimized bridges like canonical bridges between Ethereum and its L2 or IBC are no exceptions to that.

In October 2022, the BSC bridge suffered an exploit of over $100M, which could have easily become the largest hack in crypto history. The root cause was a flaw in the IAVL Merkle proof verification system, which was using a non-maintained Cosmos IBC library. Soon after that, a group of developers and researchers discovered the Dragonberry vulnerability, which made possible the forging of proofs in IBC so that a malicious user could double-spend assets in multiple chains. The exploit of Dragonberry could have meant the loss of 100s of millions of dollars in the Cosmos ecosystem. The vulnerability was timely patched but showed again that sound trust-minimized protocols are also vulnerable to software bugs.

In response to that, the Osmosis team introduced IBC rate limits in October 2022 after the BSC hack and Dragonberry incidents. It’s been the first proposed standard of bridge rate limits implemented in production. Let’s explore the current implementation of IBC rate limits and how they compare with other designs and alternatives.

‍

2. State of the Art

In this section, we cover the main categories of onchain safety and risk mitigation techniques, provide an overview of Osmosis IBC rate limits, and briefly compare them with alternatives from other projects.

2a. Onchain Bridge Safety

Securing mission-critical, high-stakes systems such as blockchain bridges requires a defense-in-depth approach, leveraging multiple security layers and mechanisms.

Traditionally, the standard approach to secure crypto protocols (including bridges) has been focusing on pre-deployment security. Pre-deployment security refers to security measures applied to a project before its deployment into mainnet, including audits and formal verification techniques. A focus on pre-deployment security is understandable when a protocol is fully decentralized and immutable. However, the reality is that the vast majority of protocols in crypto are not immutable and tend to grant a subset of users with “sudo” powers to upgrade or pause the functionality of a protocol in emergencies.

It has been empirically demonstrated over the last couple of years that audits and pre-deployment security are not enough. Post-deployment security, in the form of monitoring, threat prevention, and onchain safety mechanisms, has been proposed as a complementary defensive layer since 2016. After the DAO hack, the first decentralized escape hatch for DAOs was proposed as a mechanism that could have avoided the exploit. However, the adoption rate of similar security mechanisms has been slow, with the exception of DAO governance timelocks.

Recently, given the large number and size of exploited funds in bridges, pioneering teams are revisiting these ideas to incorporate onchain safety and risk mitigation mechanisms to improve the security of cross-chain transactions in their projects.

Currently, we differentiate into the following categories of cross-chain safety mechanisms:

Circuit breakers
Rate limits
Settlement delayers
Bridge Redundancy Protocols

Circuit breakers

Circuit breakers are safety mechanisms that trip or pause certain functionality of a system when specific conditions occur. Circuit breakers can have varying degrees of complexity, from a simple kill-switch or pause to disabling all the functionality of a protocol to more granular tripping per functionality, time window, or addresses.

For the purpose of this study, we’ll categorize as a circuit breaker any tripping mechanism that requires the invocation of a trigger by an off-chain 3rd party. In this context, an automated circuit breaker refers to the system in which this triggering is automated with off-chain monitoring systems.

An example of a circuit breaker is the Circuit module of the Cosmos SDK. Currently, the Circuit module features tripping capabilities per functionality (i.e., Cosmos message type) but doesn’t support per address or temporary tripping.

Rate Limits

Rate limits are safety controls that disable certain functionality of a system when a pre-defined threshold is surpassed. In this context, we categorize rate limit mechanisms that don’t require external trigger invocation, as the tripping conditions are defined in the protocol. As such, rate limit thresholds tend to be simpler and numeric.

There are different types of rate limits based on the threshold type:

Value rate limits, which trip when a specific amount of funds during a pre-defined time window have been used or flown over the system. The threshold can be defined as in-kind (e.g., amount of tokens) or denominated in any other currency (e.g., dollar-based value).
Volume rate limits (or speed bumps) trip when a specific functionality of a protocol has been used more times than the threshold over a pre-defined time window. An example would be that a certain function has been called more than X times in the last hour. This type of rate limit is analogous to web API rate limits.

Delayers

Settlement delayers impose a delay window to certain transactions, during which 3rd party monitoring systems can flag if a transaction is malicious. For example, a bridge could impose a 1hr delay on transfers over $1M.

The threshold doesn’t need to be a constant value. For example, there are alternative mechanisms that could define a decaying function, so the latency between transaction proposal and settlement is proportional to the amount being withdrawn or transferred.

Timelocks or speed bumps, which have been common to safeguard DAO governance proposals and upgrades, are another example of a delayer mechanism customized to a particular message and payload.

Bridge Redundancy Protocols

Preventative safety mechanisms, which consist of the relay of messages over several bridge protocols and/or clients, add redundancy in case any of them is affected by a bug or exploit. Examples of bridge redundancy approaches are MMA, led by the Uniswap DAO and Hashi.

2b. Osmosis IBC Rate Limits

Overview

To our knowledge, the Osmosis IBC rate limit module is the first standard governance-configurable implementation of a cross-chain bridge token transfer rate limit in the whole crypto ecosystem.

Rate limits are expressed in static time periods (e.g., 24 hours) and measure the net flow of an asset (inflows vs. outflows) compared with the quota of a channel. If the quota is surpassed in the given time period, no further IBC transfer will be allowed until the next period starts.

The rate limit logic is implemented as a CosmWasm smart contract, which interacts with an IBC Middleware package that wraps the standard ICS20 transfer application.

Osmosis rate limits have influenced the creation of other adjacent solutions, such as Stride’s IBC rate limits, which are implemented as a native Cosmos SDK module.

Osmosis Rate Limits Timeline

Rate limits on Osmosis v13 upgrade
Osmosis post on 30/10/2022. Link
Osmosis community update on 5/12/2022. Link
Release on GitHub. Link
Governance proposal posted 18/12/2022. Link
Signaling proposal to establish initial rate limits params passed on 18/02/2023. Link
Rate limits initial params set on v15 upgrade
Osmosis community update on 13/3/2023. Link
Release on GitHub. Link

Implementation Details

Rate limits are grouped together using composite keys in the format of (channel_id, denom), with each keyspace able to contain multiple rate limit rules. In addition, any keyspace can be used which applies rate limits against all transfers.

Anytime a rate limit is examined for a particular denom and channel, all rules are evaluated sequentially. With a transaction being rate limit if either those or the any rate limits fail. Note that all rate limit rules must be examined before the transaction fails, so large amounts of rate limits can potentially cause non-negligible resource consumption.

Denoms are grouped on whether they are non-native to osmosis, or native to osmosis. Non-native tokens use a denom in the format of ibc/sha256(denom_trace).

Rate limits measure the inflow and outflow of assets, and fail when they surpass a percentage of the rate limit’s measured channel value. For native assets, the channel value is the total amount of escrowed tokens across all IBC channels, while non-native assets use the total supply of tokens on osmosis. Channel values are static during any given time period and will only be updated after a period has expired.

At the moment, most rate limits use time periods in daily, weekly, and monthly intervals. However, time periods can be very granular as they are measured in seconds. That being said, it’s recommended that time periods are long enough so that any rate-limited failures have a sufficient amount of time to be analyzed to determine if there are any security risks, etc. As such it seems like the shortest rate limit frequently used is 1 day.

2c. Other Examples

Stride Cosmos SDK

Stride, a project building multichain liquid staking solutions in the Cosmos ecosystem, has developed a variation of IBC rate limits. Inspired by the Osmosis implementation, Stride's is the only native Cosmos SDK module solution. This could have several benefits, but mainly it could make IBC rate limits more portable to any Cosmos chain, as the Osmosis IBC rate limits have several requirements for host chains such as CosmWasm.

In the case of Stride, there are two main reasons to use rate limits:

Minimize the damage in the case of an exploit.
Limit and smooth the price variation between native and derivative assets (stAssets), to prevent potential depegs.

Implementation

In Stride's implementation, each rate limit is applied at a ChannelID + Denom granularity and is evaluated in evenly spaced fixed windows. For instance, a rate limit might be specified on uosmo (denominated as ibc/D24B4564BCD51D3D02D99[...]A9BD54CB8A5EA34 on Stride), on the Stride <-> Osmosis transfer channel (channel-5), with a 24 hour window.

Wormhole Governor

The Wormhole bridge suffered one of the largest exploits in 2022, losing more than $300M. After that, Wormhole has become one of the pioneers in bridge security, investing in new mechanisms and heavy R&D. One of those approaches is the Wormhole Governor, which implements a combination of rate limits and large transactions delays.

The Governor enables Guardians to limit the amount of notional value that can be transferred out of a given chain within a sliding time period, with the aim of protecting against external risks such as smart contract exploits or runtime vulnerabilities.

The current implementation works on two classes of transactions (large and small) and the current configuration can be found here:

Large Transactions. A transaction is large if it is greater than or equal to the [.in-line-code]bigTransactionSize[.in-line-code] for a given origin chain. All large transactions will have a mandatory 24-hour finality delay and will have no effect on the dailyLimit.
Small Transactions. A transaction is small if it is less than the bigTransactionSize for a given origin chain. All small transactions will have no additional finality delay up to the dailyLimit defined within a 24hr sliding window. If a small transaction exceeds the dailyLimit, it will be delayed until it either fits inside the dailyLimit and will be counted toward the dailyLimit has been delayed for 24 hours and will have no effect on the dailyLimit.

The above checks will produce 3 possible scenarios:

‍Non-Governed Message: If a message does not pass checks (1-4), ChainGovernor will indicate that the message can be published.‍
Governed Message (Large): If a message is “large”, ChainGovernor will wait for 24hrs before signing the VAA and place the message in a queue.‍
1. Governed Message (Small): If a message is “small”, ChainGovernor will determine if it fits inside the dailyLimit for this chain. If it does fit, it will be signed immediately. If it does not fit, it will wait in the queue until it does fit. If it does not fit in 24hrs, it will be released from the queue.

While messages are enqueued, any Guardian has a window of opportunity to determine if a message is fraudulent using their own processes for fraud detection. If Guardians determine a message is fraudulent, they can delete the message from the queue from their own independently managed queue. If a super minority of Guardians (7 of 19) delete a message from their queues, this fraudulent message is effectively censored as it can no longer reach a super-majority quorum.

ERC7281

Proposal to extend the ERC20 interface to move the token issuers, ownership, and sovereignty from the bridge to the token issuer. This extension would enable token issuers to add bridge-specific rate limits for a given token.

That's a wrap for part 1 of our series on IBC Rate Limits. In the next days, we'll publish Part 2 which is a deep dive on Osmosis Rate Limis, including a historical analysis on rate limited assets and bridge inflows and outflows.

Back to blog