HomeBrands in ConversationALBERT – A Lite BERT

ALBERT – A Lite BERT

Every researcher or NLP (Natural Language Processing) practitioner is well aware of BERT (Bidirectional Encoder Representations from Transformers) which came about in 2018. Since then the NLP industry has surely transformed by a much larger extent. ALBERT, ‘A Lite BERT’ was made to make it as light as possible by reducing parameter size. The best advantage of Deep Learning for Sentiment Analysis Task is the step where preprocessing of data undergoes reduction. The only preprocessing required would be to convert to lower case. If machine learning methods like logistic regression with TF-IDF are being used, then it will be needed to remove the unnecessary words.

Transformer models, especially BERT transformed the NLP pipeline. These solved the problem of sparse annotations for text data. Instead of training a model from scratch now, it can simply fine-tune existing pre-trained models. But the sheer size of BERT makes it slightly unapproachable. It is very compute-intensive and time-consuming to run inference using BERT. ALBERT is a lite version of BERT which simultaneously maintains the performance and downsizes BERT. Researchers at Google Research and Toyota Technological Institute at Chicago published this model in a paper presented at ICLR 2020.

ALBERT – The Architecture

ALBERT can be simply defined as an encoder-decoder model with self-attention at the encoder end and attention on encoder outputs at the decoder end. The backbone of the architecture is mainly the multi-headed, multi-layer Transformer.

The main ‘mission’ of ALBERT is to reduce the number of parameters (a reduction up to 90%) using novel techniques while not taking a big hit to the performance. This compressed version now scales a lot better than the original BERT, improving the performance while simultaneously keeping the model small. It consists of several blocks stacked one above the other; each of these blocks contains a multi-head attention block and a Feedforward Network. There are few changes to the architecture mentioned in the case of ALBERT. Following are some of the techniques that ALBERT uses to achieve compression.

Parameters’ Factorization
Hidden layer representations are required to be large to accommodate the context information with the word-level embedding information. However, by increasing the hidden layer size, the number of parameters that blow up also increases. ALBERT interestingly factorizes these word-level input embeddings into lower dimensions.
Cross-Layer Parameter sharing
Although stacking independent layers increases the learning capacity of the models, it greatly increases the redundancy. ALBERT deals with this redundancy by sharing the parameters between groups of layers. This reduces the number of total parameters while keeping the number of layers constant.
Inter sentence coherence loss
This loss is used to improve the performance of the representations in the downstream tasks. BERT model is trained beforehand for the task of NSP (Next Sentence Prediction).

ALBERT is a very useful variant of BERT which is highly compact. It can enhance the efficiency of the performance of downstream language understanding tasks while keeping the computational overhead under an acceptable level for several applications.

Follow and connect with us on Facebook, Linkedin & Twitter

Post Views: 459

ALBERT – A Lite BERT

ALBERT – The Architecture

Salve Group Introduces Parasoft Foot Cream: A Targeted Solution for Dry, Cracked Feet

A Moonlit Evening of Sound Healing by the Pool at The Leela Gandhinagar

Enamor unveils Taapsee Pannu as brand ambassador with the latest Bamboo Cotton campaign; begins their ‘Unapologetic As I Am’ era

Impulse Coffees Appoints Zealocity’s Zee as the Face of Its Newly Launched Guilt-Free Iced Teas

IIHMR University Hosted Webinar on Advancing Healthcare Leadership for Working Professionals

LEAVE A REPLY Cancel reply

Latest Posts

Why MBA In Analytics?

Ad Hoc Financial Analysis

GPT-3 -The Rising Stardom

Why Disruptive Innovation?

EDITOR PICKS

Salve Group Introduces Parasoft Foot Cream: A Targeted Solution for Dry, Cracked Feet

A Moonlit Evening of Sound Healing by the Pool at The Leela Gandhinagar

Enamor unveils Taapsee Pannu as brand ambassador with the latest Bamboo Cotton campaign; begins their ‘Unapologetic As I Am’ era

POPULAR POSTS

EUT20 Belgium is bringing cricket to a whole new arena and India is at the heart of it

The Body Shop Reimagines Indian Beauty Traditions with New High-Performance, Smudge-Proof Kajal

Kia India Partners with HYBE INDIA to Power India’s First Global Music Audition

POPULAR CATEGORY

ABOUT US

FOLLOW US