A Rate-limited Sidekiq Job - Part 1

Problem: A background task needs to hit an external service, but not too frequently

Solution: Use the rate limit gem

What do we do when the limit has been met?

Recently, I’ve been working on the above. I thought I'd write about the initial solution I tried, which is fine, but doesn't quite scale if you are frequently hitting the limit. The next post will address an alternate take.

TLDR; raise an exception if the rate limit has been met, and let sidekiq queue the job for later.

Sidekiq has a great feature, in that failed jobs will be re-queued.

Some details on the rate limits involved:

We want to hit the external service (Strava API) a maximum of 600 times in a fifteen minute period (900 seconds).

To build the basic Sidekiq job, retrying every fifteen minutes if we failed (to let the rate limit replenish)

class StravaSyncUserJob
  include Sidekiq::Worker
  
  sidekiq_retry_in do |count|
    15.minutes
  end

  def perform(user_id)
  end
end

Now if our StravaSyncUserJob#perform implementation raises, the task will be shelved for a later attempt. Let’s configure the rate limiting with the Ratelimit gem

In our Gemfile:

gem 'ratelimit'

Then install the gem with bundle install

Now we’ll setup some values:

RL_SUBJECT = "users" # Just a way to separate different ratelimit counts, can be any string in our case

# 600 hits per 15 minutes
RL_THRESHOLD = 600
RL_INTERVAL = 15.minutes

And implement our sidekiq perform method:

def perform(user_id)
  ratelimited do
    fetch_strava_data(user_id)
  end
end

Creating the ratelimit object is easy, we give it a unique key, and an instance of the redis client, if we already have one for other purposes:

def ratelimit
  @ratelimit ||= Ratelimit.new(
    "strava_sync",
    redis: $redis # We are already using Redis elsewhere in the app. If you aren't, leave out this parameter
  )
end

Next, we’ll implement the ratelimited method, which accepts a block and only calls it if the service has not exceeded the limit.

def ratelimited(&:block)
  raise "Ratelimit met" if ratelimit.exceeded?(RL_SUBJECT, interval: RL_INTERVAL, threshold: RL_THRESHOLD)
  block.call
end

Above, you can see we implicitly raise a RuntimeError with a message. This exception will trigger Sidekiq to re-queue the job and try in 15 minutes.

Earlier, I hinted this was not the solution we ended with (I’ll write that up in the next post). We have somewhere around 6000 users for which we want twice daily sync of strava trips taken. Coupled with the short nature of the job (many requests will do little work as not all users will have recorded a ride since our last check), this causes lots of retries as we quickly hit the 600 requests per 15 minutes ceiling. However, the above approach sees perhaps a majority of the jobs re-queued by Sidekiq. This is fine, as they will get serviced eventually, and sidekiq is very efficient at it’s work. But thousands of jobs were being requeued was the norm (not the exception), which feels wrong.