Graphcore and Hugging Face now offer a larger number of modalities and tasks in the Hugging Face Optimum open-source library dedicated to performance optimization

Designers have a wide choice of off-the-shelf hugging-face transformer designs optimized to deliver the best possible performance on Graphcore IPUs.

In addition to the BERT Transformer model, available since the launch of Optimum Graphcore, developers have access to nine models that support automatic language processing, computer vision, and speech recognition. They come pre-built with IPU configuration files and customized parameters and ready to use.

New Optimum models

  • Industrial vision

    the Lives (Vision Transformer) is a major advance in image recognition and uses the Transformer mechanism as its main component. So when images are loaded into ViT, they are broken down into smaller units, just like words in linguistic processing systems. Each unit is then encoded by the transformer (this is called integration) to be processed individually.

  • Automatic language processing

    the GPT-2 (Generative Pre-trained Transformer 2) is a pre-trained Transformer model for generating text on a very large corpus of English data in a self-regulated manner. This means that the model is pre-trained with only plain text, using an automated process to generate inputs and labels without human intervention (hence the use of public data). It is trained to generate texts by guessing the next word in a given sentence.

    the Roberta (Robustlyoptimized BERT approach) is a pre-trained transformer model on a large corpus of English data in a self-regulated way (like GPT-2). This model was pre-trained with the aim of MLM (Masked Language Modeling). For a given sentence, it randomly masks 15% of the provided words and then runs the entire masked sentence to guess the hidden words. RoBERTa can therefore be used for Masked Language Modeling (MLM), but was primarily designed to be fine-tuned for downstream tasks.

    the DeBERTa (Decoding-enhanced BERT with disentangled attention) is a pre-trained neural language model for automatic language processing tasks. It updates the 2018 BERT and 2019 RoBERTa models with two innovative techniques, namely a mixed attention mechanism and an improved mask decoder, greatly optimizing pre-training efficiency and post-task performance.

    the beard is a transformer encoder encoder (seq2seq) model with a bidirectional encoder (BERT) and an autogressive decoder (GPT). BART is pre-trained by (1) corrupting the text with an arbitrary noise function and (2) training a model to reconstruct the original text. The BART model is particularly effective when adapted for text generation (e.g. as part of a summary, for translation, etc.) but also for comprehension tasks (classification, question-answering, etc.).

    the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a multimodal Transformer model for learning language representations and visions. It has three coders: object relation coder, language coder, and intermodality coder. It is pre-trained with a combination of MLM, Visual-Language Text Alignment, ROI-Feature Regression, Masked Visual-Attribute Modeling, Masked Visual-Object Modeling, and Visual-Question Answering Objectives. This model achieved optimal results on the Visual Question Answering CQA and VQA datasets.

    the T5 (Text-to-Text Transfer Transformer) is a revolutionary new model capable of converting any text into a machine learning compatible format for translation, question answering or classification. It provides a consistent framework for converting text problems to plain text format. Thus, the same models, objective functions, hyperparameters, and decoding methods can be reused in the context of different tasks related to natural language processing.

  • voice recognition

    the HUBERT (Hidden-Unit BERT) is a self-regulated speech recognition model that is pre-trained with audio data. Its learning consists of a speech/acoustic model on continuous inputs. The HuBERT model is as equal or even more efficient than the wav2vec 2.0 running in the Librispeech (960 h) and Libri-light (60,000 h) corpora with the 10 min, 1 h, 10 h, 100 h and 960 h subsets .

    the Wav2Vec2 is a pre-trained and self-regulated speech recognition model. Using an innovative contrastive pre-training lens, Wav2Vec2 is able to learn practical voice representations from a huge set of unlabeled data and then adapt it based on some transcribed data. It is thus more powerful and conceptually simpler.

Hugging Face Optimum Graphcore: The Future of a Successful Partnership

Graphcore became a founding member of the Hugging Face Hardware Partner Program in 2021. Both companies had the goal of facilitating innovation in the field of artificial intelligence.

Since then, Graphcore and Hugging Face have worked hard to simplify and speed up the training of Transformer models on IPUs. The first Optimum Graphcore (BERT) model was released last year.

Transformers have proven to be very effective with a range of features including extraction, text generation, sentiment analysis, translation and more. Models like BERT are commonly used by Graphcore customers in a variety of situations such as cybersecurity, voice call automation, drug discovery, and translation.

Optimizing their performance requires a lot of time, effort and expertise that many companies and organizations cannot afford. Thanks to Hugging Face and its open source library of transformer models, these problems are a thing of the past. The integration of Hugging Face IPUs also allows developers to take advantage of the models and datasets available in the Hugging Face Hub.

Developers can now rely on Graphcore systems to train ten types of state-of-the-art transformer models and access thousands of data sets with little coding. This partnership provides users with tools and an ecosystem to download and tweak state-of-the-art ready-made models suitable for many domains and tasks.

Benefit from the latest hardware and software from Graphcore

While Hugging Face users are already enjoying the speed, performance, and cost benefits of IPU technology, adding the latest Graphcore hardware and software will only multiply those improvements tenfold.

In terms of hardware, the Bow IPU (announced in March and shipping now) is the world’s first processor to leverage 3D wafer-on-wafer (WoW) stacking technology, taking the already proven performance of the IPU to new heights lifts. Each Bow IPU features revolutionary advances in computer architecture and implementation, communications and storage. This IPU offers up to 350 traFLOPS AI calculation (i.e. 40% more performance) and up to 16% more energy efficiency compared to the previous generation. Hugging Face Optimum users can freely switch between older generation IPUs and Bow processors as no coding is required.

With software also playing a crucial role, Optimum offers a plug-and-play experience with Graphcore’s easy-to-use Poplar SDK (also updated to version 2.5). Poplar makes it easy to train state-of-the-art models on the most advanced devices with full integration with standard machine learning frameworks (including PyTorch, PyTorch Lightning, and TensorFlow) and deployment and orchestration tools like Docker and Kubernetes. Because Poplar is compatible with these widely used third-party systems, developers can easily transpose models from other computing platforms, taking full advantage of the IPU’s advanced AI functionalities.

Source: hug face

And you?

What do you think ?

See also:

AI researcher trains AI chatbot on 4chan to make it a real hate speech machine, after 24 hours, nine instances of the bot running on 4chan were posted 15,000 times

NLP Cloud now supports GPT-J, an advanced open source automatic language processing model, the open source GPT-3 alternative

Qwant is preparing for a change of direction under pressure from investors, Eric Landri will leave the presidency and Qwant will become the French government’s default search engine

Leave a Comment