Bridges

IBC Rate Limits: Extending IBC Rate Limits (3/3)

This is part 3 of our research report on bridge security, IBC, and rate limits as safety mechanisms. Built in collaboration with Osmosis.
Andres Monty
Andres Monty
January 19, 2024

This is the 3rd and final part of our research series on IBC Rate Limits. If you haven't been able to read Parts 1 and 2, we strongly encourage reading them first to take the most out of this post. Below you can find the Table of Contents of the series:

  1. Introduction
  2. State of the Art (1/3)
  3. Osmosis IBC Rate Limits (2/3)
  4. Extending IBC Rate Limits (3/3)

Now, let's get started with Part 3, where we explore new mechanisms and improvement to the current IBC Rate Limits implementation based on the data-driven analsysis and frameworks that we developed in Part 2. It should be noted that the goal of this reseach post is to explore the different directions available, and not advocating or recommending for a specific feature to be implemented in the particular case for Osmosis. If you want to see our recommendations for Osmosis IBC Rate Limits, check out the last section of Part 2.

4. Extending IBC Rate Limits

Since this section is dense in content, we provide below a table of contents to ease the digestion of the materials:

4a. Automatic Quotas as Safety Backstop

4b. Mitigating Boundary Attacks

   i. Automatic Period Rollovers

   ii. Two Period Average

   iii. Decay N-period Average

4c. Notional Value Rate Limits

   i. Oracle risks

   ii. Safety Mechanisms for Integrating Oracles

   iii. Further resources

4d. Alternative Mechanisms

   i. Speed Bumps or Timelocks

   ii. Large Transaction Delay

   iii. Value-based Latency

   iv. Message-based Rate Limits

4e. Conditional Rate Limit Bypassing

   i. Sender-based Allowlist

   ii. NFT-gated Bypass

   iii. Per-transaction Bypass

   iv. Initial Grace Period

The current implementation of Osmosis IBC rate limits is the first step to ensuring cross-chain safety. In this section, we explore more advanced mechanisms and potential alternatives to improve the current implementation and analyze their potential challenges and pitfalls.

4a. Automatic Quotas as Safety Backstop

Currently, a governance proposal is needed to create or update the rate limit and quota of a given channel. Thus, when a new IBC channel is created, assets will be able to flow freely from and to Osmosis (except for denoms that rate limited on [.in-line-code]channel=any[.in-line-code]). Until a governance proposal gets approved and executed, the newly created channel won’t have any type of rate limit, diminishing the effectiveness of the safety mechanism.

By enabling automatic registration of rate limits with default values, the overall security of Osmosis can be increased by avoiding delays due to governance and external processes needing to create rate limits anytime a new message, denom, or channel is added.

Enabling default rate limit registration will consist of adding a new variant to the [.in-line-code]Sudo[.in-line-code] message such as [.in-line-code]RegisterDefaultLimits[.in-line-code], while adding the following function to handle rate limit creation:

pub fn automatic_rate_limit_creation(
    deps: DepsMut,
    env: Env,
    packet: Packet,
    flow_type: FlowType,
) -> Result {
    let (channel_id, denom) = packet.path_data(&flow_type); // Sends have direction out.
    let path = &Path::new(channel_id, &denom);
    //let any_path = Path::new("any", &denom);
    let funds = packet.get_funds();

    let path_msg = PathMsg::new(path.channel.clone(), denom, vec![
        QuotaMsg::new("daily", 86400, 30, 30),
        QuotaMsg::new("week", 604800, 60, 60),
    ]);

    add_new_paths(deps, vec![path_msg], env.block.time)?;

    return Ok(Response::new()
        .add_attribute("method", "auto_rate_limit")
        .add_attribute("channel_id", path.channel.to_string())
        .add_attribute("denom", path.denom.to_string())
        .add_attribute("funds", funds.to_string())
        .add_attribute("quota", "default"));
}

Full PoC Implementation

The proof of concept implementation for default rate limit creations can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/5

4b. Mitigating Boundary Attacks

The current Osmosis IBC rate limit design is susceptible to boundary attacks, which are a form of attack that attempts to circumvent rate limits by conducting attacks shortly before a time period ends and concluding the attack after the new time period begins, thus practically bypassing the intended rate limit threshold. This is possible because values tracked by the rate limits reset to 0 when a new period begins.

Example Of A Boundary Attack

An attacker has identified a vulnerability allowing 200k USDC to be minted on Osmosis, which may be transferred through IBC to another chain. A rate limit is configured, allowing for a maximum of 100k tokens to be transferred during any given window. Such a scenario leads to two possible options for exfiltrating the falsely minted USDC:

  • Naive strategy: The attacker immediately mints 200k USDC, transferring 100k through Axelar. The Osmosis core devs notice the attack and submit a governance proposal that fixes the underlying issue and blocks further transfers. The attacker is only able to exfiltrate 100k USDC.
  • Smart attacker: The attacker checks when the rate limit’s current period ends, waiting until a short multiple of block time before the expiration. Just before the time period ends, 100k USDC is minted and transferred. After waiting for the new period to begin, the attacker mints the remaining 100k USDC and transfers them. Thanks to the boundary attack, the exploiter is able to exfiltrate 200k USDC.

4bi. Automatic Period Rollovers

Current rate limits use time periods that are not automatically rolled over upon expiration of the current period, instead relying on evaluation of the rate limit to trigger rollover. This is performed whenever the rate limit [.in-line-code] allow_transfer [.in-line-code] method is invoked. Upon function invocation the current flow is cached,  the rate limit is checked for period expiration, and if need be channel values are recalculated along with a period rollover.

By changing this design such that period rollover can only take place via a permissioned Sudo message executed by the chain, we allow for automatic rollovers that take place during the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] stages of IBC middleware. Such a mechanism also allows for more flexible rate limit configurations while also providing the framework for enabling more complex rules.

Notably, the lack of rolling time windows can adversely affect UX, especially during periods of high volatility, as it becomes possible for “stale” values to be cached in the rate limit storage. Additionally, it theoretically reduces attack surface as an attacker is no longer able to influence when the rate limit is rolled over.

Automatic Period Rollover

The current rate limit design uses a period rollover process, which can only be triggered when the rate limit is evaluated. The alternative option is to enable automatic period rollover, which can be triggered by chain-based keepers. Although such a design doesn’t explicitly fix boundary attacks, it allows for the implementation of more robust boundary attack mitigation strategies.

Furthermore, if the period rollover is limited only to chain keepers, we can apply the rollover during the [.in-line-code]EndBlock[.in-line-code] steps of the IBC middleware, thus enforcing transactions executing in a block to be evaluated using values that are not reset. Such an implementation can potentially increase the difficulty of conducting boundary attacks as the attacker is not able to reset the rate limit before the rate limit evaluation.

Implementation Details

Automatic rollovers can be implemented in two different ways using either the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] functions in [.in-line-code]x/ibc-rate-limit/ibcratelimitmodule/module.go[.in-line-code] and update the rate-limiter contract.

The safest option for handling the rollover process is likely by triggering it in the [.in-line-code]EndBlock[.in-line-code] function since even when the period expires, transactions in that block are evaluated before the tracked value is reset to 0.

The rate limiter contract [.in-line-code]Sudo[.in-line-code] message type can be extended to include a new variant (i.e., [.in-line-code]RolloverRules[.in-line-code]),  which, when called executes logic similar to the following:

for (key, mut rules) in RATE_LIMIT_TRACKERS.range(deps.storage, None, None, cosmwasm_std::Order::Ascending).flatten().collect::>() {
        // avoid storage saves unless an actual rule was updated
        let mut rule_updated = false;
        rules.iter_mut().for_each(|rule| {
            if rule.flow.is_expired(env.block.time) {
                rule.flow.expire(env.block.time, rule.quota.duration);
                rule_updated = true;
            }
        });
        if rule_updated {
            RATE_LIMIT_TRACKERS.save(deps.storage, key, &rules)?;
        }
    }

Then, update the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] functions to execute logic similar to

contract := am.ics4wrapper.GetContractAddress(ctx)
	if contract == "" {
		return
	}
	contractAddr, err := sdk.AccAddressFromBech32(contract)
	if err != nil {
		return
	}
	asJson, err := json.Marshal("rollover_rules")
	if err != nil {
		return
	}
	_, err = am.ics4wrapper.ContractKeeper.Sudo(ctx, contractAddr, asJson)
	if err != nil {
		return
	}

Full PoC Implementation

The proof of concept implementation of Automatic Rollovers can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/3

4bii. Two Period Average

The simplest option for minimizing the ability to conduct boundary attacks is by averaging out the value tracked by a rate limit across two different time periods, which we’ll term A and B. By averaging out the values from A with the current tracked value in B, the impact of the tracked value being reset does not result in rate limit evaluation starting at 0, reducing the potential scope of damage.

Image 1: Daily supply change of basecro in Osmosis, with the rate limits in dashed lines

Consider the following example of using a two-period average that limits transfers of OSMO to 100k in a single period. In period A attacker transfers 100k OSMO. When the period ends, instead of being able to transfer an additional 100k OSMO tokens, only 50k OSMO tokens can be transferred. With the current rate limit design, the attacker would be able to transfer 100k OSMO since unaveraged values are used, thus allowing for full capacity in period B.

Although this is possible to implement without automatic period rollovers, relying on rate limit evaluation to trigger period rollovers performs poorly in times of high volatility.

Two-period Average Implementation

By extending the rate limit object to the following, we have the ability to store inflow, outflow, and channel values from period A when rolling over into period B:

/// current state (i.e.: how much value has been transfered in the current period)
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, JsonSchema)]
pub struct RateLimit {
    pub quota: Quota,
    pub flow: Flow,
    pub previous_channel_value: Option,
    pub previous_inflow: Option,
    pub previous_outflow: Option,
}

In order to calculate the averaged values of capacity, we can add the following functions:

pub fn averaged_channel_value(&self) -> Option {
        // when a rule is first initialized there is no previous period value, so there is nothing to average
        if self
            .previous_channel_value
            .unwrap_or(Uint256::zero())
            .is_zero()
        {
            return Some(cosmwasm_std::Decimal256::new(self.quota.channel_value?));
        }
        Some(
            (cosmwasm_std::Decimal256::new(self.quota.channel_value?) + cosmwasm_std::Decimal256::new(self.previous_channel_value?))
                / cosmwasm_std::Decimal256::from_atomics(2_u64, 0).ok()?,
        )
    }
pub fn averaged_capacity(&self) -> Option<(Uint256, Uint256)> {
        let averaged_channel_value = self.averaged_channel_value()?;
        let averaged_channel_value: Uint256 = averaged_channel_value.atomics();
        Some((
            averaged_channel_value * Uint256::from(self.quota.max_percentage_recv)
                / Uint256::from(100_u32),
            averaged_channel_value * Uint256::from(self.quota.max_percentage_send)
                / Uint256::from(100_u32),
        ))
    }
    pub fn averaged_capacity_on(&self, direction: &FlowType) -> Option {
        let (max_in, max_out) = self.averaged_capacity()?;
        match direction {
            FlowType::In => Some(max_in),
            FlowType::Out => Some(max_out),
        }
    }

4biii. Decay N-period Average

A significant disadvantage of the two-period average method is that it has very poor UX in periods of high volatility, even with automatic period rollovers, due to the static value used from period A. To address this, a decaying function can be applied to the value from period A before it is used to average out against period B; The UX is improved because the impact that high volatility has on subsequent time periods reduces the further into a period the rate limit is. At the start of period B, the value of period A is used as is, progressing to 0 at the end of period B.

A slightly more advanced implementation of the two aforementioned averaging strategies would be to use a decaying function that decays a value to 0 over N periods. For example, a decay four-period average would decay a value from the end of period A to 0 by the end of period E.

To showcase this design, we implement the simplest version of the Decay N-period Average, with [.in-line-code]N=2[.in-line-code]:

Decay Two Period Average Implementation

Building upon the two-period example, we can further extend the rate limit object with the following:

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, JsonSchema)]
pub struct RateLimit {
    pub quota: Quota,
    pub flow: Flow,
    pub previous_channel_value: Option,
    pub previous_inflow: Option,
    pub previous_outflow: Option,
    pub period_start: Option,
}

By adding the following function, we can calculate how far into the current period we are and use the output as the parameter used when calculating the decayed value:

pub fn period_percent_passed(
        &self,
        // block_time_second is the timestamp retrieved from the block header
        block_time_second: u64,
    ) -> Option {
        let period_start_seconds = self.period_start?.seconds();
        return Some(cosmwasm_std::Decimal256::percent(
            ((block_time_second - period_start_seconds) * 100)
                / (self.flow.period_end.seconds() - period_start_seconds),
        ));
    }

To calculate the decayed channel value, we can use a function similar to the following:

let percent_passed = self.period_percent_passed(env.block.time.seconds())?;
let previous_channel_value = cosmwasm_std::Decimal256::new(self.previous_channel_value?);
let decayed_amount = previous_channel_value * percent_passed;
self.decayed_value = Some(previous_channel_value - decayed_amount);

Then to calculate the decayed two-period average:

pub fn averaged_channel_value(&self) -> Option {
        // when a rule is first initialized there is no previous period value, so there is nothing to average
        if self
            .previous_channel_value
            .unwrap_or(Uint256::zero())
            .is_zero()
        {
            return Some(cosmwasm_std::Decimal256::new(self.quota.channel_value?));
        }
        Some(
            (cosmwasm_std::Decimal256::new(self.quota.channel_value?) + self.decayed_value?)
                / cosmwasm_std::Decimal256::from_atomics(2_u64, 0).ok()?,
        )
    }

Full PoC Implementation

The proof of concept implementation of rolling time periods with decay two-period average can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/4

4c. Notional Value Rate Limits

Existing rate limit documentation suggests the intention of using USDC prices of assets and total dollar value to limit transfers. In this section, we explore the challenges of relying on oracles to achieve a notional value rate limit implementation.

4ci. Oracle Risks

To implement USDC-denominated rate limits based on oracles, great care needs to be taken to ensure the fitness of the prices sourced via the oracle.

Image 2: Net inflow for channel-5, denominated in $

For example, using prices sourced from CLMMs are not well suited for reliable price tracking, particularly in periods of high volatility, with a level of security that decreases exponentially faster than that of CPMM oracles the lower the availability liquidity gets:

This is a bigger problem for Uniswap’s V3 TWAP compared to V2 since liquidity providers aren’t incentivised to provide full-range liquidity, in fact, they’re incentivised to put it in narrower ranges that can earn more fees. Exploiters can wait for a dump or pump that would place the current price past a concentrated mass of liquidity, and thus more easily push and pull the price in their preferred direction afterwards.

In general, oracle implementations (especially CLMM-based) need to be extremely robust and well-audited due to the wide variety of exploits that can be introduced. Some resources are included below which detail the security risks of oracles, notably CLMM-based:

4cii. Safety Mechanisms for Integrating Oracles

When possible, it’s best to use pre-existing oracle solutions that are battle-tested rather than roll-your-own; however, this is not always possible.

In general, when building an oracle integration, it’s important to leverage as many independent security mechanisms as possible, minimizing the co-dependency each mechanism has with each other. By doing this, you minimize possible side effects from one mechanism failing and taking down the dependent mechanism.

Using Fair LP Pricing For CPMMs

A number of protocols over the years have been attacked due to incorrect implementation and failing to handle many of the edge cases present in AMM-sourced Oracle prices. Alpha Homora has published a very thorough analysis of the type of exploits this prevents, as well as how to implement them.

Multi-Venue Price Sourcing

Due to the open nature of DeFi, it is not uncommon for one venue (i.e., Uniswap) to report prices that are slightly different than another venue (i.e., Sushiswap). Because DeFi relies on arbitrageurs to equalize these prices, and arbitraging is an inherently profit-driven act, the exact time between when a price discrepancy in one venue arises and when it is equalized to another venue/market rate can vary.

As such it is important to source prices from multiple high-liquidity venues when possible, and it is important to use high-liquidity venues, as using a venue with poor liquidity will likely result in less accurate pricing.

4ciii. Further Resources

The literature on the risks and potential pitfalls of using oracles is extensive. Some notable examples can be found below:

4d. Alternative Mechanisms

4di. Speed Bumps or Timelocks

A speed bump involves delaying specific messages from being invoked for a certain period of time; One of the most well-known examples likely being Compound Finance. For example, consider a DAO treasury that is sending 100k USDC of treasury reserve funds to a security auditor. Instead of the transfer of value taking place immediately upon transaction confirmation, it is delayed for X time period (i.e., 7 days). This allows the DAO to have the ability to revert the transaction in case a malicious actor has submitted it.

Although speed bumps are more or less similar to the current Osmosis rate limits, they differ in the type of value used to trigger rate limits, with Osmosis rate limits using the value of inflows/outflows and speed bumps using the number of times a transfer message is sent.

Additionally, for all intents and purposes, the terms “speed bump” and “timelock” seem to be interchangeable in that the end result of speed bump limitations or timelock limitations are the same, resulting in delayed execution.

4dii. Large Transactions Delay

Large transaction delays can be considered similar to timelocks, although, with a static amount of time, transactions are delayed when the value is above a specific threshold. A detailed example of this can be found in the Wormhole governor documentation. However, for a more basic example, we can look at EigenLayer, which implements a static delay of 7 days for withdrawals.

Taking a look at the current implementation of Wormhole, we identify two main deficiencies that should be corrected:

  1. If off-chain prices are sourced to assess the value of assets being transferred, CoinGecko should never be used for this, nor any mission-critical implementation where security of the goal.
  2. Implement a counter that tracks the value of assets sent by a specific sender in a given time period, and once their total value transferred reaches a threshold, any further transfers in a given time period will be delayed.

PoC implementation

The Proof of Concept implementation for Large Transactions Delay can be found here.

4diii. Value-based Latency

Value-based latency is similar to large transaction delays in that the actual execution of the message is delayed. However, instead of using a fixed delay, we use a dynamic delay increasing in duration the greater the value that is being transferred.

For example, consider the following graph (plot [.in-line-code]e^x[.in-line-code] from [.in-line-code]x=0[.in-line-code] to [.in-line-code]10[.in-line-code]), with the X axis representing the delay in minutes from execution and the Y axis representing the value of a transfer:

Image 3: Exponential delay applied against the value of the transfer

PoC implementation in progress

The Proof of Concept implementation for Value-based Latency can be found here.

4div. Message-based Rate Limits

Osmosis currently uses per-denomination rate limits, which limit the amount of value that can flow in/out of Osmosis in a given time period while being limited to IBC messages. Alternatively, a more granular form of rate limit can be written and classified as a message-based applying to arbitrary cosmos messages.

Given that the current per-denomination rate limits only apply to the inflow/outflow of a particular asset using IBC, the ability to provide security coverage for the entire chain is diminished, as messages sent locally on Osmosis itself not transiting IBC channels are excluded from the rate limiting.

By introducing a second type of rate limit classified as “message-based” that allows for rate limits to apply to arbitrary messages based on URLs (i.e.,[.in-line-code]/cosmos.circuit.v1.MsgAuthorizeCircuitBreaker[.in-line-code]), we can provide defense in depth for Osmosis, securing both IBC transfers, as well as messages which are sent locally on Osmosis.

To facilitate automatic registration of message-based rules, whenever a new module is added or an existing module is extended with a new message type, an upgrade can be written in [.in-line-code]app/upgrades/vX/vX.go[.in-line-code], which invokes a sudo function to register the rules.

4e. Conditional Rate Limit Bypassing

In this section we explore a set of mechanims that apply conditional bypassing rules for certain conditios, such as address whitelists or holders of a particular token:

4ei. Sender-based Allowlist

Allow a runtime configurable mapping of addresses to be maintained, which, when sending messages, can bypass rate evaluation, resetting the permission after the transfer is complete.

A similar mechanism is implemented in the Stride IBC rate limits implementation.

4eii. NFT-gated Bypass

Allow senders of a message that holds an NFT to be excluded from rate limit analysis, with the available NFTs managed through governance. The NFT would need to be non-transferrable (or, at the very least, burnable via governance) as it can pose a potential security risk if the owner’s private key is compromised.

4eiii. Per-transaction Bypass

Allow senders to request the ability to bypass rate limit evaluation (or enforcement) for a particular transaction. While this is perhaps the most flexible mechanism of bypass, it does involve non-trivial amounts of manual review or the implementation of bespoke off-chain monitoring solutions. It also likely requires some sort of UI to make it easier to review/approve requests, in case the review process was intended to be community-driven.

4eiv. Initial Grace Period

Either on a per-sender, per-channel, or per-denomination basis, allow up to X amount to be transferred at the start of a period before allowing rate limits to kick in.

Appendix

Implementation Notes

Handling Module Upgrades

For Cosmos-SDK modules that need to be upgraded, “In Place Store Migrations” can potentially be used, however, a custom migration would likely be needed to migrate existing rate limits to new rate limits.

Handling CosmWasm Upgrades

For cosmwasm contract upgrades, anytime object fields are changed a storage migration needs to be implemented via the [.in-line-code]MsgMigrate[.in-line-code] message. Depending on the number of rate limits that need to be migrated, a multi-stage migration needs to be done to avoid running into OOG (out-of-gas) errors. For more information on this style of migration see here, while a basic single-stage migration can be seen here. More information can be found in the official CosmWasm repository.

Based on Lido’s Terra integration, it appears that the most effective way of altering the object state by introducing additional fields is to add each additional field with a type [.in-line-code]Option<T>[.in-line-code].

PoCs

As part of the R&D efforts to write this report, we’ve implemented the following proof of concepts of potential improvements of the current Osmosis IBC rate limits implementation:

About Range

Range builds security infrastructure for sovereign blockchains and rollups, with a focus on the Cosmos ecosystem and bridges such as the Inter-Blockchain Communication Protocol (IBC). Range's product suite encompasses tools for monitoring, threat detection and prevention, analytics, and forensics in order to strengthen the security of the interchain and modular ecosystems.