Introduction
When it comes time to deploy your Tensorflow / Keras model to Heroku, there are quite a few issues that can arise. After spending many hours sifting through StackOverflow posts while trying to deploy my own flask web app to Heroku, I decided to compile the errors I encountered here. Hopefully I can save you some time in troubleshootings your own errors.
If you haven’t written your web app yet, I wrote a tutorial on building a flask web app for keras models.
Article Overview
- Failed building wheel for h5py heroku
- Slug size too large deploying keras heroku
- Errors importing audio library librosa on heroku
- Heroku 30 second timeout workaround python
- Redis worker can’t find file in tmp directory on heroku
How to fix ERROR: Failed building wheels for h5py when installing Keras to Heroku?
Simply updating your versions of pip, setuptools and wheels should resolve this error.
pip install --upgrade pip setuptools wheel
How to fix Slug size too large error on Heroku when using Tensorflow in flask application?
If you are trying to include the Tensorflow 2.0 module in your requirements.txt, you will encounter this error. Tensorflow 2.0 is bigger than the total 500MB limit that Heroku allows for all your dependencies since it has GPU support. However, Heroku doesn’t support GPU so you can just install the CPU only module.
Simply replace tensorflow with tensorflow-cpu in your requirements.
If this is still not enough to get you under the 500MB limit, consider using tensorflow version 1.14
since it is almost half the size of tensorflow 2.0+ and can do almost all the same things.
How to fix errors including librosa audio processing library in Heroku requirements.txt?
The python library librosa uses the soundfile library as a dependency. However, soundfile is not a Python library. It needs to be installed through the use of apt, which is a package manager for Ubuntu software. To use librosa on Heroku properly, create an Aptfile in your root directory with the following content.
libsndfile1
libsndfile-dev
libasound2-dev
How to make long requests and get around the 30 second timeout on Heroku?
If you are doing any expensive computation like feature extraction, your request will likely timeout. Thankfully, there is an elegant solution to trigger a response from a background job on heroku to workaround a request that takes longer than 30 seconds.
Using Redis, we can setup a worker that will execute our expensive computation in the background. The initial request will return immediately, and we will periodically checkup on the status of the actual computation from the javascript code. All credit for this clever solution goes to Gokul Viswanath.
First, create the file worker.py in the root directory with the following code:
# Before executing, enter this in Terminal
# source venv/bin/activate (for virtualenv)
# sudo service redis-server start (for starting the redis service)
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
Next, add the following line to your Procfile:
worker: python worker.py
Now, in app.py or whatever your equivalent file is, we will define the following route:
@app.route('/tasks/<taskID>', methods=['GET'])
def get_status(taskID):
task = q.fetch_job(taskID)
# If such a job exists, return its info
if (task):
responseObject = {
"success": "success",
"data": {
"taskID": task.get_id(),
"taskStatus": task.get_status(),
"taskResult": task.result
}
}
# Else, return an error
else:
responseObject = {"status": "error"}
return responseObject
Now, in your main.js or equivalent file, you can define the following function:
function get_status(taskID, funcToCall) {
$.ajax({
method: "GET",
url: `tasks/${taskID}`,
})
.done((response) => {
const taskStatus = response.data.taskStatus;
if (taskStatus === "failed") {
console.log(response);
return false;
} else if (taskStatus == "finished") {
// Parse the returned JSON and return the link to the image.
console.log(response);
window[funcToCall](
response.data.taskResult.result,
response.data.taskResult.errors
);
return false;
}
// If the task hasn't been finished, try again in 1 second.
setTimeout(function () {
get_status(response.data.taskID, funcToCall);
}, 1000);
})
.fail((error) => {
console.log(error);
});
}
This function gets the status of a specific task that our Redis worker is executing. It recursively executes until the expensive computation task is completed and can then execute some callback you define, funcToCall.
Let’s see how to use this to work around the 30 second timeout for Heroku HTTP requests:
@app.route('/predict', methods=['GET', 'POST'])
def predict():
if request.method == 'POST':
file = request.files['audio_file']
enc=b64encode(file.read())
task = q.enqueue(extract_features_and_predict, enc)
# create a dictionary with the ID of the task
responseObject = {"status": "success", "data": {"taskID": task.get_id()}}
# return the dictionary
return jsonify(responseObject)
return None
Instead of doing the expensive computation extract_features_and_predict within my /predict route, I create a task and simply return the unique ID.
Now, in my javascript code, I check-in on the status of the request and execute my callback function.
function predictImage(image) {
var form = new FormData();
form.append('audio_file', image, "mysong.wav")
fetch("/predict", {
method: "POST",
headers: {
},
body: form
})
.then(resp => {
if (resp.ok)
resp.json().then(response => {
get_status(response.data.taskID, displayResult);
});
})
.catch(err => {
console.log("An error occured", err.message);
window.alert("Oops! Something went wrong.");
});
}
I send my request to the /predict route and then immediately start the process of checking if the background job has finished so I can trigger my response. When the job finally returns, I call my callback function displayResults. It is shown below:
function displayResult(data, errors) {
// display the result
hide(spinner)
imageDisplay.src = './static/' + String(data[0]) + '.png'
if (imageDisplay.classList.contains('hidden')) {
show(imageDisplay)
}
show(justopinion)
show(forthissong)
}
This solution is a powerful way to workaround the 30 second timeout for Heroku requests and update the frontend from the background.
Why can’t my Redis worker find files in the tmp directory on Heroku but it works on my local machine?
Heroku has an “ephemeral” hard drive which is cleared every time the app restarts. So, any temporary files you save will go away when the app is deployed, or when it is automatically restarted (once every 24 hours).
The issue comes when you introduce Redis or Redistogo and try to access temporary files. Using Redis or Redistogo will cause your app to have multiple dynos. Heroku does not sync files across dynos so if you save a file from your Flask code and try and access it using your Redis worker, the file will not exist on.
The solution to this issue is to use some 3rd party file storage such as an S3 Bucket. Unfortunately, this will incur some cost.
For the frugal: If you are trying to save temporary files and access them using a Redis worker, you are better off hosting your web app on AWS EC2. The free tier has enough storage that you can write a decent amount of temporary files and delete them without any additional cost.
Click here for a great tutorial on deploying your web app to AWS EC2.
[…] This is the second article in a series talking about how I made and deployed a neural network to rate songs like the popular music reviewer Anthony Fantano. You can read about the process of researching and developing the neural network in the first article. The final article covers the issues I encountered deploying a Flask Web Application to Heroku. […]