How To Use AugLy On Image, Video, Audio, and Text?

Facebook just recently released the AugLy package to the public domain. In this article, we will take a dive into the package.

Introduction to Data Augmentation

Facebook just recently released the AugLy package to the public domain. In this article, we will take a dive into the package. Just before data however, let’s take some time to understand what data augmentation is about. 

Many coding examples are shown below. You can view the complete notebook on Github.

Article Overview

What is data augmentation? 

Data augmentation is a technique in machine learning that involves the creation of modified versions of the training data in a bid to increase the size of the training data. 

Generally speaking, increasing the size of the training data increases the performance of a neural network. This is why data augmentation is used to create artificial variations of the training data. With data augmentation, the model is poised to perform better during training and generalize well during testing. 

Let’s give an example to understand better. Say you are training a model to classify a dog image and you have several images of a dog. Most likely the picture shots may be taken at the same angle and at a distance away from the dog. If the model is trained on this kind of image alone, there is a tendency of misclassification if the test data is an image of a dog from an elevated height. Or within a closer range. To ensure the model is robust to all kinds of image, data augmentation ensures an image from a specific angle of shot generates more variations of that image. For instance, at a different angle of shot, within a closer range, at a titled shear angle and so on. 

About AugLy

AugLy is a recently released open source project in python that can be used for data augmentation. The aim is to help AI models have an improved robustness during training and evaluation. Data augmentation in images can involve processes such as cropping of images or changing the pitch of a voice in an audio file. AugLy helps to automatically create such variations of the data. 

According to Facebook, AugLy is the first of its kind tool in the open source domain that has several modalities such as images, videos, audio, texts, etc, which is immensely important for emerging AI research. It utilizes real operations that people do to images on Facebook and Instagram to generate over 100 variations of the data. For instance, overlapping emojis, texts or screenshots is a popular thing many individuals do and so AugLy performs such transformations for its data augmentation. 

Another operation humans now perform is the combination of data of different modalities. For instance, the text ‘you look good’ may sound like a compliment. However, by adding an emoji, say the emoji of a clown, completely changes how the initial text was perceived. The ‘compliment’ would undoubtedly be seen as an insult. This is just the same way people take in information in today’s world and AugLy takes those eventualities into cognizance. As more and more data modalities are combined, there is a need to ensure all the data augmentation and transformation can be done under a library or API. 

Facebook iterates the data augmentation done by the library in consonance with the transformation users of Facebook, Instagram and WhatsApp typically do. Thus, it would be a particularly useful function for folks working on AI models for social media applications. 

How AugLy Works

AugLy library is divided into four sub-libraries, each for different kinds of data modalities (audio, images, videos and texts). All the four sub-libraries, however, use a uniformed interface. Data augmentation can be done using a function-based approach as well as a class-based approach. There are also intensity functions that determine how intense the transformations would be. These are defined by parameters when called the AugLy function. The function also creates a metadata after transformation for personal purposes, should you want to have a better insight to how the data were transformed. 

The team of engineers and researchers at Facebook that developed AugLy pride themselves on aggregating the other data augmentation approaches used by existing libraries and further creating novel methods themselves. For instance, transforming an image during data augmentation can involve adding the interface of Facebook around it. This is typically the way the images are screenshot and shared on the platform and thus, it is critical for the model to understand that while it is still the same image, there may be distractions such as the interfaces of a social media platform around the image. Even with the new elements around the image, the model can still make accurate predictions. 

Why AugLy 

Although AugLy was done for data augmentation, it’s application transcends ensuring the model is trained robustly on a data. One interesting application of AugLy is the detection of copied content. SimSearchNet, for instance, was a model built to detect similar contents on the web. The model was trained using AugLy for data augmentation. 

Furthermore, AugLy can be used to determine the effectiveness of models with respect to data augmentation. In other words, the library can be used as a yardstick to determine whether your model was correctly trained to be robust to various data transformations. As a use case, Facebook announced that AugLy was the library used in selecting the top 5 models in the Deepfake Detection Challenge.

Image Data Augmentation with AugLy

First of, the library will be installed using pip.

!pip install -U augly

To use the sub-library for image, we import the imaugs submodel from augly. Utils is used to import the image that would be used for this illustration. The image was gotten from one of the images available in the utils function of AugLy.

import os
import augly.image as imaugs
import augly.utils as utils
from IPython.display import display
 
# Get input image, scale it down to avoid taking up the whole screen
input_img_path = os.path.join(
    utils.TEST_URI, "image", "inputs", "dfdc_1.jpg"
)

Using AugLy for Image Scaling

AugLy can be used to scale the image using the scale() method of imaugs. The method takes an important parameter called factor. The factor determines how large or small the image would appear. If it it set to a small value, the image appears, large and if otherwise, the image appears larger.

input_img = imaugs.scale(input_img_path, factor=0.1)
display(input_img)

When set to a higher value, the image appears larger.

input_img = imaugs.scale(input_img_path, factor=0.5)
display(input_img)

Creating Memes with AugLy

AugLy can be used to create a meme by using the meme_format() method. This method produces an image that looks like a meme, giving a text and an image. For instance, the image used contains the string ‘LOL’. The code snippet below calls the meme_format() method and applies some parameters discussed momentarily.

display(
  imaugs.meme_format(
        input_img,
        caption_height=75,
        meme_bg_color=(0, 0, 0),
        text_color=(255, 255, 255),
    )
)

Parameters such as caption_height, meme_bg_color, text_color is used to tweak how the meme will be represented. (0, 0, 0) is the RGB color colour for black while (256, 256, 256) is that of white.

Converting Images to Mobile Screenshots

The Compose() function is used to transform an image into various form. The function takes a list of other parameters for the new conversion such as it’s saturation the kind of overlay to use for the mobile snapshot. In the code snippet below, the saturation was left at its default value while the template was set to mobile.

aug = imaugs.Compose(
    [
        imaugs.Saturation(),
        imaugs.OverlayOntoScreenshot(
            template_filepath=os.path.join(
                utils.SCREENSHOT_TEMPLATES_DIR, "mobile.png"
            ),
        ),
        imaugs.Scale(factor=0.5),
    ]
)
display(aug(input_img))

AugLy with Textual Data

AugLy can be used to detect and correct typographical errors in a statement. To perform this, the textaugs data is imported. Afterwards, the simulate_typos method is called with the text passed as an argument. This is demonstrated in the code below.

import augly.text as textaugs
message = 'Hi everyn, I hope yu are habing a great time herre'
 
#correct the typo
corrected = textaugs.simulate_typos(message)
 
print(corrected)

Output: Hi everyone, I hope you are having a great time here

AugLy with Audio Data

AugLy can perform operations such as changing the pitch of an audio, slowing down or speeding up the rate of an audio, tweaking the metadata of the audio file, adding background noise, etc. 

If you wish to increase the pitch of an audio file, the code below, perform such a task. First, make sure augly is installed on your machine and import the necessary libraries. In this case, audaugs from the augly.audio submodule.

import os
import augly.audio as audaugs
import augly.utils as utils
from IPython.display import display, Audio

The audio file can be imported. Here we import a 3 seconds audio clip of my recording. It was added to my Google Drive, which was to be mounted on Google colab.

from google.colab import drive
drive.mount('/content/drive')
input_audio = '/content/drive/MyDrive/audiodata.wav'

The pitch_shift method is used to increase/decrease the sound pitch. It was increased by 10 steps. This method receives the audio file and the number of steps as parameters. It returns the audio file and the sample rate.

#increase the pitch of the audio file
aug_audio, sr = audaugs.pitch_shift(input_audio, n_steps=10)
display(Audio(aug_audio, rate=sr))

To extend the time of an audio file, the time_stretch method is used. It receives the audio file and the rate as parameters. When the rate is set to a high value, the time is shorter. In other words, the audio speed is faster. The reverse is the case if the rate is set to a low value.

aug_audio, sr = audaugs.time_stretch(
    input_audio,
    rate=2
)
display(Audio(aug_audio, rate=sr))

Since the rate was set to 2, the audio which was once 3 seconds long is now 1 second long.

Using the Compose class to add more features to the Audio

If you wish to add more elements to the audio, you can use the Compose class and pass a list of features you want added. If you want to add a background noise, you can call the AddBackgroundNoise() class. To add a clicking sound, you can call the Clicks() class. This is demonstrated in the code below.

aug = audaugs.Compose(
    [
        audaugs.AddBackgroundNoise(),
        audaugs.Clicks()
    ]
)
aug_audio, sr = aug(input_audio_arr, sample_rate=sr)
display(Audio(aug_audio, rate=sr))

The output is the same audio but with a background noise and a clicking sound.

Using AugLy with Video files

AugLy again works with video files for various augmentation processes such as overlaying a text, looping a video, adding an emoji etc. To use augly with video files, the necessary libraries are imported. Afterword, a function is created to display the video.

from IPython.display import display, HTML
from base64 import b64encode
 
def display_video(path):
  mp4 = open(path,'rb').read()
  data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
  display(
    HTML(
      """
          <video width=400 controls>
                <source src="%s" type="video/mp4">
          </video>
      """ % data_url
    )
  )

Then the videos to be used are imported

import os
import augly.utils as utils
import augly.video as vidaugs
 
# Get input video, trim to first 3 seconds
input_video = os.path.join(
    utils.TEST_URI, "video", "inputs", "input_1.mp4"
)
input_vid_path = "/tmp/in_video.mp4"
out_vid_path = "/tmp/aug_video.mp4"
 
# We can use the AugLy trim augmentation, and save the trimmed video
vidaugs.trim(input_video, output_path=input_vid_path, start=0, end=3)
display_video(input_vid_path)

Add Text to Videos with AugLy

To add texts to the video, the overlay_text method is used.

vidaugs.overlay_text(input_vid_path, out_vid_path)
display_video(out_vid_path)

Loop a Video with AugLy

To make the video keep looping, the loop() method is used.

meta = []
vidaugs.loop(
    input_vid_path,
    out_vid_path,
    num_loops=1,
    metadata=meta,
)
display_video(out_vid_path)
meta

Output:

[{‘dst_duration’: 6.050139,
  ‘dst_fps’: 29.916666666666668,
  ‘dst_height’: 1080,
  ‘dst_segments’: [OrderedDict([(‘start’, 0.0), (‘end’, 3.008357)]),
   OrderedDict([(‘start’, 3.008357), (‘end’, 6.016714)])],
  ‘dst_width’: 1920,
  ‘intensity’: 1.0,
  ‘name’: ‘loop’,
  ‘num_loops’: 1,
  ‘src_duration’: 3.008357,
  ‘src_fps’: 29.916666666666668,
  ‘src_height’: 1080,
  ‘src_segments’: [OrderedDict([(‘start’, 0.0), (‘end’, 3.008357)]),
   OrderedDict([(‘start’, 0.0), (‘end’, 3.008357)])],
  ‘src_width’: 1920}]

Use Compose to Perform Multiple Video Transformations

Just as in audio files, the Compose class can be used to perform many transformations at once. The code below adds some noise, blur and overlay dots to the video.

aug = vidaugs.Compose(
    [
        vidaugs.AddNoise(),
        vidaugs.Blur(sigma=5.0),
        vidaugs.OverlayDots(),
    ]
)
aug(input_vid_path, out_vid_path)
display_video(out_vid_path)

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.