This is the 3rd and final part of our research series on IBC Rate Limits. If you haven't been able to read Parts 1 and 2, we strongly encourage reading them first to take the most out of this post. Below you can find the Table of Contents of the series:
Now, let's get started with Part 3, where we explore new mechanisms and improvement to the current IBC Rate Limits implementation based on the data-driven analsysis and frameworks that we developed in Part 2. It should be noted that the goal of this reseach post is to explore the different directions available, and not advocating or recommending for a specific feature to be implemented in the particular case for Osmosis. If you want to see our recommendations for Osmosis IBC Rate Limits, check out the last section of Part 2.
Since this section is dense in content, we provide below a table of contents to ease the digestion of the materials:
4a. Automatic Quotas as Safety Backstop
4b. Mitigating Boundary Attacks
i. Automatic Period Rollovers
ii. Two Period Average
iii. Decay N-period Average
4c. Notional Value Rate Limits
i. Oracle risks
ii. Safety Mechanisms for Integrating Oracles
iii. Further resources
4d. Alternative Mechanisms
i. Speed Bumps or Timelocks
ii. Large Transaction Delay
iii. Value-based Latency
iv. Message-based Rate Limits
4e. Conditional Rate Limit Bypassing
i. Sender-based Allowlist
ii. NFT-gated Bypass
iii. Per-transaction Bypass
iv. Initial Grace Period
The current implementation of Osmosis IBC rate limits is the first step to ensuring cross-chain safety. In this section, we explore more advanced mechanisms and potential alternatives to improve the current implementation and analyze their potential challenges and pitfalls.
Currently, a governance proposal is needed to create or update the rate limit and quota of a given channel. Thus, when a new IBC channel is created, assets will be able to flow freely from and to Osmosis (except for denoms that rate limited on [.in-line-code]channel=any[.in-line-code]). Until a governance proposal gets approved and executed, the newly created channel won’t have any type of rate limit, diminishing the effectiveness of the safety mechanism.
By enabling automatic registration of rate limits with default values, the overall security of Osmosis can be increased by avoiding delays due to governance and external processes needing to create rate limits anytime a new message, denom, or channel is added.
Enabling default rate limit registration will consist of adding a new variant to the [.in-line-code]Sudo[.in-line-code] message such as [.in-line-code]RegisterDefaultLimits[.in-line-code], while adding the following function to handle rate limit creation:
Full PoC Implementation
The proof of concept implementation for default rate limit creations can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/5
The current Osmosis IBC rate limit design is susceptible to boundary attacks, which are a form of attack that attempts to circumvent rate limits by conducting attacks shortly before a time period ends and concluding the attack after the new time period begins, thus practically bypassing the intended rate limit threshold. This is possible because values tracked by the rate limits reset to 0 when a new period begins.
Example Of A Boundary Attack
An attacker has identified a vulnerability allowing 200k USDC to be minted on Osmosis, which may be transferred through IBC to another chain. A rate limit is configured, allowing for a maximum of 100k tokens to be transferred during any given window. Such a scenario leads to two possible options for exfiltrating the falsely minted USDC:
Current rate limits use time periods that are not automatically rolled over upon expiration of the current period, instead relying on evaluation of the rate limit to trigger rollover. This is performed whenever the rate limit [.in-line-code] allow_transfer [.in-line-code] method is invoked. Upon function invocation the current flow is cached, the rate limit is checked for period expiration, and if need be channel values are recalculated along with a period rollover.
By changing this design such that period rollover can only take place via a permissioned Sudo message executed by the chain, we allow for automatic rollovers that take place during the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] stages of IBC middleware. Such a mechanism also allows for more flexible rate limit configurations while also providing the framework for enabling more complex rules.
Notably, the lack of rolling time windows can adversely affect UX, especially during periods of high volatility, as it becomes possible for “stale” values to be cached in the rate limit storage. Additionally, it theoretically reduces attack surface as an attacker is no longer able to influence when the rate limit is rolled over.
The current rate limit design uses a period rollover process, which can only be triggered when the rate limit is evaluated. The alternative option is to enable automatic period rollover, which can be triggered by chain-based keepers. Although such a design doesn’t explicitly fix boundary attacks, it allows for the implementation of more robust boundary attack mitigation strategies.
Furthermore, if the period rollover is limited only to chain keepers, we can apply the rollover during the [.in-line-code]EndBlock[.in-line-code] steps of the IBC middleware, thus enforcing transactions executing in a block to be evaluated using values that are not reset. Such an implementation can potentially increase the difficulty of conducting boundary attacks as the attacker is not able to reset the rate limit before the rate limit evaluation.
Automatic rollovers can be implemented in two different ways using either the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] functions in [.in-line-code]x/ibc-rate-limit/ibcratelimitmodule/module.go[.in-line-code] and update the rate-limiter contract.
The safest option for handling the rollover process is likely by triggering it in the [.in-line-code]EndBlock[.in-line-code] function since even when the period expires, transactions in that block are evaluated before the tracked value is reset to 0.
The rate limiter contract [.in-line-code]Sudo[.in-line-code] message type can be extended to include a new variant (i.e., [.in-line-code]RolloverRules[.in-line-code]), which, when called executes logic similar to the following:
Then, update the [.in-line-code]BeginBlock[.in-line-code] or [.in-line-code]EndBlock[.in-line-code] functions to execute logic similar to
Full PoC Implementation
The proof of concept implementation of Automatic Rollovers can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/3
The simplest option for minimizing the ability to conduct boundary attacks is by averaging out the value tracked by a rate limit across two different time periods, which we’ll term A and B. By averaging out the values from A with the current tracked value in B, the impact of the tracked value being reset does not result in rate limit evaluation starting at 0, reducing the potential scope of damage.
Consider the following example of using a two-period average that limits transfers of OSMO to 100k in a single period. In period A attacker transfers 100k OSMO. When the period ends, instead of being able to transfer an additional 100k OSMO tokens, only 50k OSMO tokens can be transferred. With the current rate limit design, the attacker would be able to transfer 100k OSMO since unaveraged values are used, thus allowing for full capacity in period B.
Although this is possible to implement without automatic period rollovers, relying on rate limit evaluation to trigger period rollovers performs poorly in times of high volatility.
Two-period Average Implementation
By extending the rate limit object to the following, we have the ability to store inflow, outflow, and channel values from period A when rolling over into period B:
In order to calculate the averaged values of capacity, we can add the following functions:
A significant disadvantage of the two-period average method is that it has very poor UX in periods of high volatility, even with automatic period rollovers, due to the static value used from period A. To address this, a decaying function can be applied to the value from period A before it is used to average out against period B; The UX is improved because the impact that high volatility has on subsequent time periods reduces the further into a period the rate limit is. At the start of period B, the value of period A is used as is, progressing to 0 at the end of period B.
A slightly more advanced implementation of the two aforementioned averaging strategies would be to use a decaying function that decays a value to 0 over N periods. For example, a decay four-period average would decay a value from the end of period A to 0 by the end of period E.
To showcase this design, we implement the simplest version of the Decay N-period Average, with [.in-line-code]N=2[.in-line-code]:
Decay Two Period Average Implementation
Building upon the two-period example, we can further extend the rate limit object with the following:
By adding the following function, we can calculate how far into the current period we are and use the output as the parameter used when calculating the decayed value:
To calculate the decayed channel value, we can use a function similar to the following:
Then to calculate the decayed two-period average:
Full PoC Implementation
The proof of concept implementation of rolling time periods with decay two-period average can be found here: https://github.com/teamscanworks/ibc-rate-limits/pull/4
Existing rate limit documentation suggests the intention of using USDC prices of assets and total dollar value to limit transfers. In this section, we explore the challenges of relying on oracles to achieve a notional value rate limit implementation.
To implement USDC-denominated rate limits based on oracles, great care needs to be taken to ensure the fitness of the prices sourced via the oracle.
For example, using prices sourced from CLMMs are not well suited for reliable price tracking, particularly in periods of high volatility, with a level of security that decreases exponentially faster than that of CPMM oracles the lower the availability liquidity gets:
This is a bigger problem for Uniswap’s V3 TWAP compared to V2 since liquidity providers aren’t incentivised to provide full-range liquidity, in fact, they’re incentivised to put it in narrower ranges that can earn more fees. Exploiters can wait for a dump or pump that would place the current price past a concentrated mass of liquidity, and thus more easily push and pull the price in their preferred direction afterwards.
In general, oracle implementations (especially CLMM-based) need to be extremely robust and well-audited due to the wide variety of exploits that can be introduced. Some resources are included below which detail the security risks of oracles, notably CLMM-based:
When possible, it’s best to use pre-existing oracle solutions that are battle-tested rather than roll-your-own; however, this is not always possible.
In general, when building an oracle integration, it’s important to leverage as many independent security mechanisms as possible, minimizing the co-dependency each mechanism has with each other. By doing this, you minimize possible side effects from one mechanism failing and taking down the dependent mechanism.
Using Fair LP Pricing For CPMMs
A number of protocols over the years have been attacked due to incorrect implementation and failing to handle many of the edge cases present in AMM-sourced Oracle prices. Alpha Homora has published a very thorough analysis of the type of exploits this prevents, as well as how to implement them.
Multi-Venue Price Sourcing
Due to the open nature of DeFi, it is not uncommon for one venue (i.e., Uniswap) to report prices that are slightly different than another venue (i.e., Sushiswap). Because DeFi relies on arbitrageurs to equalize these prices, and arbitraging is an inherently profit-driven act, the exact time between when a price discrepancy in one venue arises and when it is equalized to another venue/market rate can vary.
As such it is important to source prices from multiple high-liquidity venues when possible, and it is important to use high-liquidity venues, as using a venue with poor liquidity will likely result in less accurate pricing.
The literature on the risks and potential pitfalls of using oracles is extensive. Some notable examples can be found below:
A speed bump involves delaying specific messages from being invoked for a certain period of time; One of the most well-known examples likely being Compound Finance. For example, consider a DAO treasury that is sending 100k USDC of treasury reserve funds to a security auditor. Instead of the transfer of value taking place immediately upon transaction confirmation, it is delayed for X time period (i.e., 7 days). This allows the DAO to have the ability to revert the transaction in case a malicious actor has submitted it.
Although speed bumps are more or less similar to the current Osmosis rate limits, they differ in the type of value used to trigger rate limits, with Osmosis rate limits using the value of inflows/outflows and speed bumps using the number of times a transfer message is sent.
Additionally, for all intents and purposes, the terms “speed bump” and “timelock” seem to be interchangeable in that the end result of speed bump limitations or timelock limitations are the same, resulting in delayed execution.
Large transaction delays can be considered similar to timelocks, although, with a static amount of time, transactions are delayed when the value is above a specific threshold. A detailed example of this can be found in the Wormhole governor documentation. However, for a more basic example, we can look at EigenLayer, which implements a static delay of 7 days for withdrawals.
Taking a look at the current implementation of Wormhole, we identify two main deficiencies that should be corrected:
The Proof of Concept implementation for Large Transactions Delay can be found here.
Value-based latency is similar to large transaction delays in that the actual execution of the message is delayed. However, instead of using a fixed delay, we use a dynamic delay increasing in duration the greater the value that is being transferred.
For example, consider the following graph (plot [.in-line-code]e^x[.in-line-code] from [.in-line-code]x=0[.in-line-code] to [.in-line-code]10[.in-line-code]), with the X axis representing the delay in minutes from execution and the Y axis representing the value of a transfer:
PoC implementation in progress
The Proof of Concept implementation for Value-based Latency can be found here.
Osmosis currently uses per-denomination rate limits, which limit the amount of value that can flow in/out of Osmosis in a given time period while being limited to IBC messages. Alternatively, a more granular form of rate limit can be written and classified as a message-based applying to arbitrary cosmos messages.
Given that the current per-denomination rate limits only apply to the inflow/outflow of a particular asset using IBC, the ability to provide security coverage for the entire chain is diminished, as messages sent locally on Osmosis itself not transiting IBC channels are excluded from the rate limiting.
By introducing a second type of rate limit classified as “message-based” that allows for rate limits to apply to arbitrary messages based on URLs (i.e.,[.in-line-code]/cosmos.circuit.v1.MsgAuthorizeCircuitBreaker[.in-line-code]), we can provide defense in depth for Osmosis, securing both IBC transfers, as well as messages which are sent locally on Osmosis.
To facilitate automatic registration of message-based rules, whenever a new module is added or an existing module is extended with a new message type, an upgrade can be written in [.in-line-code]app/upgrades/vX/vX.go[.in-line-code], which invokes a sudo function to register the rules.
In this section we explore a set of mechanims that apply conditional bypassing rules for certain conditios, such as address whitelists or holders of a particular token:
Allow a runtime configurable mapping of addresses to be maintained, which, when sending messages, can bypass rate evaluation, resetting the permission after the transfer is complete.
A similar mechanism is implemented in the Stride IBC rate limits implementation.
Allow senders of a message that holds an NFT to be excluded from rate limit analysis, with the available NFTs managed through governance. The NFT would need to be non-transferrable (or, at the very least, burnable via governance) as it can pose a potential security risk if the owner’s private key is compromised.
Allow senders to request the ability to bypass rate limit evaluation (or enforcement) for a particular transaction. While this is perhaps the most flexible mechanism of bypass, it does involve non-trivial amounts of manual review or the implementation of bespoke off-chain monitoring solutions. It also likely requires some sort of UI to make it easier to review/approve requests, in case the review process was intended to be community-driven.
Either on a per-sender, per-channel, or per-denomination basis, allow up to X amount to be transferred at the start of a period before allowing rate limits to kick in.
For Cosmos-SDK modules that need to be upgraded, “In Place Store Migrations” can potentially be used, however, a custom migration would likely be needed to migrate existing rate limits to new rate limits.
For cosmwasm contract upgrades, anytime object fields are changed a storage migration needs to be implemented via the [.in-line-code]MsgMigrate[.in-line-code] message. Depending on the number of rate limits that need to be migrated, a multi-stage migration needs to be done to avoid running into OOG (out-of-gas) errors. For more information on this style of migration see here, while a basic single-stage migration can be seen here. More information can be found in the official CosmWasm repository.
Based on Lido’s Terra integration, it appears that the most effective way of altering the object state by introducing additional fields is to add each additional field with a type [.in-line-code]Option<T>[.in-line-code].
As part of the R&D efforts to write this report, we’ve implemented the following proof of concepts of potential improvements of the current Osmosis IBC rate limits implementation:
Range builds security infrastructure for sovereign blockchains and rollups, with a focus on the Cosmos ecosystem and bridges such as the Inter-Blockchain Communication Protocol (IBC). Range's product suite encompasses tools for monitoring, threat detection and prevention, analytics, and forensics in order to strengthen the security of the interchain and modular ecosystems.