opendatascience | Technologies

Telegram-канал opendatascience - Data Science by ODS.ai 🦜

51547

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp

Subscribe to a channel

Data Science by ODS.ai 🦜

​​TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning

The deep learning arena is abuzz with the rise of models designed for tabular data problems, challenging the traditional dominance of gradient-boosted decision trees (GBDT) algorithms. Among these, retrieval-augmented tabular DL models, which gather relevant training data like nearest neighbors for better prediction, are gaining traction. However, these novel models have only shown marginal benefits over properly tuned retrieval-free baselines, sparking a debate on the effectiveness of the retrieval-based approach.

In response to this uncertainty, this groundbreaking work presents TabR, an innovative retrieval-based tabular DL model. This breakthrough was achieved by augmenting a simple feed-forward architecture with an attention-like retrieval component. Several overlooked aspects of the attention mechanism were highlighted, leading to major performance improvements. On a set of public benchmarks, TabR stole the show, demonstrating unparalleled average performance, becoming the new state-of-the-art on numerous datasets, and even outperforming GBDT models on a recent benchmark designed to favor them.

Code link: https://github.com/yandex-research/tabular-dl-tabr
Paper link: https://arxiv.org/abs/2307.14338

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-tabr

#deeplearning #tabular

Читать полностью…

Data Science by ODS.ai 🦜

​​Meta-Transformer: A Unified Framework for Multimodal Learning

The landscape of multimodal learning is about to witness a remarkable transformation with the introduction of Meta-Transformer, a state-of-the-art framework that's poised to overcome long-standing challenges in the field. The beauty of Meta-Transformer lies in its unique ability to process and understand information from a diverse range of modalities - from natural language, 2D images, 3D point clouds, to audio, video, time series, and tabular data. This ability stems from its innovative design that leverages a frozen encoder to map raw input data from these diverse modalities into a shared token space, eliminating the need for paired multimodal training data.

More than just a theoretical achievement, the Meta-Transformer has proven its practical application across various benchmarks, handling an impressive range of tasks from fundamental perception such as text, image, and audio processing, to more complex applications like X-Ray, infrared, and hyperspectral data interpretation, as well as data mining tasks involving graph, tabular, and time-series data.

Code link: https://github.com/invictus717/MetaTransformer
Paper link: https://arxiv.org/abs/2307.10802

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-meta-transformer

#deeplearning #nlp #transformer #cv

Читать полностью…

Data Science by ODS.ai 🦜

​​Retentive Network: A Successor to Transformer for Large Language Models

The Retentive Network (RetNet) has been proposed as a game-changing foundation architecture for large language models. RetNet uniquely combines training parallelism, low-cost inference, and impressive performance into one sleek package. It ingeniously draws a theoretical connection between recurrence and attention, opening new avenues in AI exploration. The introduction of the retention mechanism for sequence modeling further enhances this innovation, featuring not one, not two, but three computation paradigms - parallel, recurrent, and chunkwise recurrent!

Specifically, the parallel representation provides the horsepower for training parallelism, while the recurrent representation supercharges low-cost O(1) inference, enhancing decoding throughput, latency, and GPU memory without compromising performance. For long-sequence modeling, the chunkwise recurrent representation is the ace up RetNet's sleeve, enabling efficient handling with linear complexity. Each chunk is encoded in parallel while also recurrently summarizing the chunks, which is nothing short of revolutionary. Based on experimental results in language modeling, RetNet delivers strong scaling results, parallel training, low-cost deployment, and efficient inference. All these groundbreaking features position RetNet as a formidable successor to the Transformer for large language models.

Code link: https://github.com/microsoft/unilm
Paper link: https://arxiv.org/abs/2307.08621

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-retnet

#deeplearning #nlp #llm

Читать полностью…

Data Science by ODS.ai 🦜

Using Commandline To Process CSV files

- to print the first column of a CSV file: awk -F, '{print $1}' file.csv
- to print the first and third columns of a CSV file: awk -F, '{print $1 "," $3}' file.csv
- to print only the lines of a CSV file that contain a specific string: grep "string" file.csv
- to sort a CSV file based on the values in the second column: sort -t, -k2 file.csv
- to remove the first row of a CSV file (the header row): tail -n +2 file.csv
- to remove duplicates from a CSV file based on the values in the first column: awk -F, '!seen[$1]++' file.csv
- to calculate the sum of the values in the third column of a CSV file: awk -F, '{sum+=$3} END {print sum}' file.csv
- to convert a CSV file to a JSON array: jq -R -r 'split(",") | {name:.[0],age:.[1]}' file.csv
- to convert a CSV file to a SQL INSERT statement: awk -F, '{printf "INSERT INTO table VALUES (\"%s\", \"%s\", \"%s\");\n", $1, $2, $3}' file.csv

Читать полностью…

Data Science by ODS.ai 🦜

@opendatascience Open Positions Post 0

We received 8 submissions for our Talent Pool so far! There are various backgrounds from data engineers to data leads, what’s the best way to connect talents with the seekers not compromising on privacy?

We suggest that people seeking to find teammates or to hire someone may post their suggestions in comments to this post 👇🏻

Читать полностью…

Data Science by ODS.ai 🦜

​​Kandinsky 2.2
by Sber & AIRI

What has changed from Kandinsky 2.1

- Improved quality of image generation
- Ability to generate images with different aspect ratio
- Optimization of work with portraits to achieve photorealism
- Machine learning on an extensive dataset of 1.5b text-to-image pairs
- Generating stickers for Telegram and creating custom stickerpacks
- Drawing missing parts of a picture (inpainting).
- Creating pictures in infinite canvas mode (outpainting)
- Understanding queries in eng (Russian thу main)
- 20+ painting styles
- Mixing images
- Generating images similar to a given image
- Image styling by text description
- Possibility to change by text description separate objects or elements in images with preserving the composition of the original illustration (ControlNet)


Habr: https://habr.com/ru/companies/sberbank/articles/747446/
GH: https://github.com/ai-forever/Kandinsky-2/
Telegram-bot: /channel/kandinsky21_bot
MLSpace: https://cloud.ru/ru/datahub/rugpt3family/kandinsky-2-2
Web-GUI for Kandinsky 2.x: https://github.com/seruva19/kubin
FusionBrain: https://fusionbrain.ai/diffusion
RUdalle: https://rudalle.ru/
Diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/kandinsky2_2

Читать полностью…

Data Science by ODS.ai 🦜

​​Recognize Anything: A Strong Image Tagging Model

Get ready for a breakthrough in the realm of AI: introducing the Recognize Anything Model (RAM), a powerful new model that is set to revolutionize image tagging. RAM, a titan in the world of large computer vision models, astoundingly exhibits the zero-shot ability to recognize any common category with an impressive level of accuracy. Shattering traditional approaches, RAM employs a unique paradigm for image tagging, utilizing large-scale image-text pairs for training instead of relying on tedious manual annotations.

RAM's development comprises a strategic, four-step process. Initially, annotation-free image tags are obtained on a large scale via an automated text semantic parsing. This is followed by training a preliminary model for automatic annotation, fusing caption and tagging tasks under the supervision of original texts and parsed tags. Then, RAM utilizes a data engine to generate extra annotations and eliminate incorrect ones, refining the input. Finally, the model is meticulously retrained with the cleaned data and fine-tuned using a smaller, higher-quality dataset. Extensive evaluations of RAM have revealed stunning results: it outshines its counterparts like CLIP and BLIP in zero-shot performance, even surpassing fully supervised models, exhibiting a competitive edge akin to Google's tagging API!

Paper link: https://arxiv.org/abs/2306.03514
Code link: https://github.com/xinyu1205/recognize-anything
Project link: https://recognize-anything.github.io/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-ram

#deeplearning #cv #imagecaptioning

Читать полностью…

Data Science by ODS.ai 🦜

Introducing motor interface for amputee | ALVI Labs

That is the first AI model for decoding precise finger movements for people with hand amputation. It uses only 8 surface EMG electrodes.

Interface can decode different types of moves in virtual reality:
🔘finger flexion
🔘finger extension
🟣typing
🟣some more

💎Full demo: YouTube link

Subscribe and follow the further progress:
Twitter: link
Instagram: link

Please like and repost YouTube video

Читать полностью…

Data Science by ODS.ai 🦜

​​Fast Segment Anything

The Segment Anything Model (SAM), a revolutionary tool in computer vision tasks, has significantly impacted various high-level tasks like image segmentation, image captioning, and image editing. However, its application has been restricted in industry scenarios due to its enormous computational demand, largely attributed to the Transformer architecture handling high-resolution inputs.

The authors of this paper have proposed a speedier alternative method that accomplishes this foundational task with performance on par with SAM, but at a staggering 50 times faster! By ingeniously reformulating the task as segments-generation and prompting and employing a regular CNN detector with an instance segmentation branch, they've converted this task into the well-established instance segmentation task. The magic touch? They've trained the existing instance segmentation method using just 1/50 of the SA-1B dataset, a stroke of brilliance that led to a solution marrying performance and efficiency.

Paper link: https://huggingface.co/papers/2306.12156
Code link: https://github.com/CASIA-IVA-Lab/FastSAM

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-fastsam

#deeplearning #cv #segmentanythingmodel #efficiency

Читать полностью…

Data Science by ODS.ai 🦜

​​Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

In the ever-evolving field of natural language processing and computer vision research, the revolution is being led by large-scale generative models like GPT and DALL-E. These models have the remarkable capability of generating high fidelity text or image outputs, and more importantly, they possess a 'generalist' character, able to solve tasks that they weren't explicitly trained to accomplish. However, when it comes to speech generative models, there's still a significant gap in terms of scale and task generalization. Enter, Voicebox - a pioneering advancement set to redefine the landscape of speech generation technology.

Voicebox is an exceptionally versatile text-guided generative model for speech at an impressive scale. Trained on over 50K hours of unfiltered, unenhanced speech data, Voicebox is a non-autoregressive flow-matching model, designed to infill speech, given an audio context and text. Much like its predecessors, Voicebox is able to perform a wide range of tasks through in-context learning, but with an added flexibility - it can condition on future context. The applications are boundless - from mono or cross-lingual zero-shot text-to-speech synthesis to noise removal, content editing, style conversion, and diverse sample generation. What's truly phenomenal is Voicebox's capability to outshine the state-of-the-art zero-shot TTS model, VALL-E, on both intelligibility and audio similarity metrics, while being a staggering 20 times faster.

Paper link: https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/
Blogpost link: https://ai.facebook.com/blog/voicebox-generative-ai-model-speech/
Project link: https://voicebox.metademolab.com/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-voicebox

#deeplearning #nlp #speechgeneration #texttospeech

Читать полностью…

Data Science by ODS.ai 🦜

​​Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

In a recent breakthrough, a novel approach for learning highly semantic image representations has been introduced that eschews the need for hand-crafted data augmentations. The strategy, known as Image-based Joint-Embedding Predictive Architecture (I-JEPA), offers a refreshing, non-generative pathway to self-supervised learning from images. The concept underpinning I-JEPA is deceptively simple, yet incredibly powerful: it takes a single context block from an image and predicts the representations of various target blocks within the same image.

I-JEPA's core design principle - its masking strategy - plays a pivotal role in shaping the system's semantic prowess. The key is to sample target blocks at a sufficiently large, semantic scale while using a context block that provides ample, spatially distributed information. When integrated with Vision Transformers, I-JEPA exhibits impressive scalability. To illustrate, a ViT-Huge/14 model was trained on ImageNet using just 16 A100 GPUs in under 72 hours, delivering robust performance across a wide spectrum of tasks, including linear classification, object counting, and depth prediction.

Paper link: https://arxiv.org/abs/2301.08243

Code link: https://github.com/facebookresearch/ijepa

Blogpost link: https://ai.facebook.com/blog/yann-lecun-ai-model-i-jepa/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-ijepa

#deeplearning #cv #selfsupervisedlearning

Читать полностью…

Data Science by ODS.ai 🦜

🇵🇹Are there people in Lissabon? Let’s meet for a brunch this week!

Читать полностью…

Data Science by ODS.ai 🦜

​​StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

In a ground-breaking exploration of visual representation learning, researchers have leveraged synthetic images produced by leading text-to-image models, specifically Stable Diffusion, achieving promising results. The study uncovers two key insights - firstly, when configured correctly, self-supervised methods trained on synthetic images can match or even outperform those trained on real images. This suggests an exciting avenue for efficient and effective representation learning, reducing the need for extensive real image datasets.

Secondly, the researchers have devised a novel approach called StableRep, a multi-positive contrastive learning method that treats multiple images, generated from the same text prompt, as mutual positives. The compelling finding is that StableRep, trained solely with synthetic images, outperforms representations learned by prominent methods such as SimCLR and CLIP, even when these used real images. In a striking demonstration, when language supervision is added, StableRep trained with 20M synthetic images outperforms CLIP trained with a whopping 50M real images. These findings not only underscore the potential of synthetic data but also pave the way for more efficient, large-scale visual representation learning.

Paper link: https://arxiv.org/abs/2306.00984

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-stablerep

#deeplearning #cv #nlp #stablediffusion #texttoimage #syntheticdata

Читать полностью…

Data Science by ODS.ai 🦜

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM (Salesforce)

The authors we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. CodeTF is designed with a unified interface to enable rapid access and development across different types of models, datasets and tasks. The library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes.

Читать полностью…

Data Science by ODS.ai 🦜

​​Chain of Hindsight Aligns Language Models with Feedback

AI language models are becoming a major part of our digital world. The challenge, however, lies in aligning these models with human preferences to be genuinely useful and valuable. Current methods, although successful in many ways, have limitations - they are either inefficient in utilizing data or depend heavily on challenging reward functions and reinforcement learning.

Here comes "Chain of Hindsight," an exciting, novel technique inspired by human learning mechanisms. It can learn from any form of feedback, even transforming it into language for fine-tuning the model. This approach conditions the model on a sequence of model generations paired with feedback, helping it learn to correct negative attributes or errors. It is significantly outperforming previous methods, particularly showing major strides in summarization and dialogue tasks.
Paper link: https://arxiv.org/abs/2302.02676

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-coh
#deeplearning #nlp #llm

Читать полностью…

Data Science by ODS.ai 🦜

An interesting theoretical result on gradient descent complexity. I missed it before.

https://www.quantamagazine.org/computer-scientists-discover-limits-of-major-research-algorithm-20210817/

The Complexity of Gradient Descent: CLS = PPAD ∩ PLS

https://arxiv.org/abs/2011.01929

Читать полностью…

Data Science by ODS.ai 🦜

Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis

ECGAN новая система для решения сложной задачи семантического синтеза изображений.

🖥 Github: https://github.com/ha0tang/ecgan

📕 Paper: https://arxiv.org/abs/2307.12084v1

🔥 Dataset: https://paperswithcode.com/dataset/cityscapes

ai_machinelearning_big_data

Читать полностью…

Data Science by ODS.ai 🦜

​​Paper Review: Llama 2: Open Foundation and Fine-Tuned Chat Models

Introducing Llama 2, a cutting-edge ensemble of large language models ranging from 7 to 70 billion parameters! These models, specially fine-tuned for dialogue use cases, not only outperform existing open-source chat models but also showcase exemplary performance in safety and helpfulness. Llama 2 creators have opened the door for AI community, sharing their detailed approach to inspire further advancements in the development of responsible AI.

Project link: https://ai.meta.com/llama/
Model link: https://github.com/facebookresearch/llama
Paper link: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-llama2

#deeplearning #nlp #safetyai #responsibleai

Читать полностью…

Data Science by ODS.ai 🦜

​​Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Introducing CM3Leon (pronounced “Chameleon”), a multi-modal language model that's revolutionizing the realms of text and image generation. This model, designed with a decoder-only, retrieval-augmented, and token-based structure, expands on the established CM3 multi-modal architecture. It showcases the striking benefits of scaling and diversification in instruction-style data. The most impressive part? It's the first of its kind, trained with a recipe inspired by text-only language models, including a substantial retrieval-augmented pretraining phase and a secondary multi-task supervised fine-tuning (SFT) stage. It exemplifies the power of general-purpose models, capable of both text-to-image and image-to-text generation.

CM3Leon isn't just a theoretical model, but a proven performer. Through extensive experiments, it demonstrates the effectiveness of this new approach for multi-modal models. Remarkably, it achieves state-of-the-art performance in text-to-image generation, requiring 5x less training compute than comparable methods, and achieving a zero-shot MS-COCO FID of 4.88. Post-SFT, CM3Leon exhibits an unmatched level of controllability across various tasks, ranging from language-guided image editing to image-controlled generation and segmentation.

Paper link: https://ai.meta.com/research/publications/scaling-autoregressive-multi-modal-models-pretraining-and-instruction-tuning/
Blogpost link: https://ai.meta.com/blog/generative-ai-text-images-cm3leon/

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-cm3leon

#deeplearning #cv #nlp #imagegeneration #sota #multimodal

Читать полностью…

Data Science by ODS.ai 🦜

Practical ML Conf - The biggest offline ML conference of the year in Moscow.

- https://pmlconf.yandex.ru
- September 7, Moscow
- For speakers: offline
- For participants: offline and online (youtube)
- The conference language is Russian.

Call for propose is open https://pmlconf.yandex.ru/call_for_papers

#conference #nlp #cv #genAI #recsys #mlops #ecomm #hardware #research #offline #online

Читать полностью…

Data Science by ODS.ai 🦜

​​UniverSeg: Universal Medical Image Segmentation

Get ready for a major breakthrough in the field of medical image segmentation! Deep learning models, despite being the primary tool for medical image segmentation, have always struggled to generalize to new, unseen segmentation tasks involving different anatomies, image modalities, or labels. This has typically required researchers to spend significant time and resources on training or fine-tuning models for each new task, a process often out of reach for many clinical researchers. Enter UniverSeg, a trailblazing solution that simplifies this process by tackling unseen medical segmentation tasks without any need for additional training. Its revolutionary Cross-Block mechanism delivers accurate segmentation maps from a query image and a set of example image-label pairs, completely eliminating the need for retraining.

To make this leap, the team behind UniverSeg went the extra mile and assembled MegaMedical, an expansive collection of over 22,000 scans from 53 diverse open-access medical segmentation datasets. This wide variety of anatomies and imaging modalities provided a comprehensive training ground for UniverSeg, priming it to excel in a multitude of scenarios. The results are nothing short of phenomenal - UniverSeg substantially outperforms several related methods on unseen tasks, bringing a new era of efficiency and accessibility to medical imaging.

Paper link: https://arxiv.org/abs/2304.06131
Project link: https://universeg.csail.mit.edu/
Code link: https://github.com/JJGO/UniverSeg

A detailed unofficial overview of the paper:
https://andlukyane.com/blog/paper-review-universeg-med

#deeplearning #cv #imagesegmentation

Читать полностью…

Data Science by ODS.ai 🦜

​​Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

In the rapidly evolving landscape of artificial intelligence, a groundbreaking approach to supervised classification performance has been born. Modern hierarchical vision transformers have been known to incorporate various vision-specific components, aiming to enhance accuracies and produce desirable FLOP counts. However, these augmentations have led to slower processing times compared to their vanilla ViT counterparts. In this exciting research, we challenge the necessity of such additional complexities.

Enter Hiera, an innovative and significantly simplified hierarchical vision transformer that champions efficiency without compromising accuracy. By deploying a potent visual pretext task, MAE, we're able to eliminate the bells-and-whistles from a state-of-the-art multi-stage vision transformer. The result? A lean, mean machine learning model that not only outperforms its predecessors in terms of accuracy but also achieves superior speed, both during inference and training. Tested across a diverse array of image and video recognition tasks, Hiera stands as a beacon of progress in the field of computer vision.

Paper link: https://arxiv.org/abs/2306.00989
Code link: https://github.com/facebookresearch/hiera

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-hiera

#deeplearning #cv #transformer #sota

Читать полностью…

Data Science by ODS.ai 🦜

​​Multilingual End to End Entity Linking

Introducing BELA, an unprecedented, open-source solution that is set to revolutionize the Natural Language Processing (NLP) arena! BELA addresses the complex challenge of Entity Linking, a task prevalent in many practical applications, by offering the very first fully end-to-end multilingual model. Astoundingly, it can efficiently identify and link entities in texts across an expansive range of 97 languages, a capability hitherto unseen. This marks a significant leap towards streamlining complex model stacks that have been a pervasive issue in the field.

BELA's architectural novelty lies in its adoption of a bi-encoder design. This enables it to conduct end-to-end linking of a passage in a single forward pass through a transformer, regardless of the number of entity mentions it contains. In its core Entity Disambiguation sub-task, it cleverly deploys a k-nearest neighbor (kNN) search using an encoded mention as a query in an entity index. What's even more impressive is BELA's scalability—it handles up to 16 million entities and delivers a remarkable throughput of 53 samples per second on a single GPU.

Paper link: https://arxiv.org/abs/2306.08896
Code link: https://github.com/facebookresearch/BELA

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-bela

#deeplearning #nlp #entitylinking #multilingual

Читать полностью…

Data Science by ODS.ai 🦜

​​Tracking Everything Everywhere All at Once

In the field of motion estimation, a remarkable breakthrough has just arrived! Introducing OmniMotion, an innovative method that pioneers a complete and globally consistent motion representation. OmniMotion moves beyond the constraints of traditional optical flow or particle video tracking algorithms that are hindered by limited temporal windows and difficulties in maintaining global consistency of estimated motion trajectories. Instead, OmniMotion enables accurate, full-length motion estimation of every pixel in a video sequence - a truly remarkable feat.

OmniMotion represents a video using a quasi-3D canonical volume and accomplishes pixel-wise tracking via the transformation between local and canonical spaces. This representation doesn't just ensure global consistency; it also opens the doors to tracking through occlusions and modeling any mixture of camera and object motion. The extensive evaluations conducted on the TAP-Vid benchmark and real-world footage have proven that OmniMotion outperforms existing state-of-the-art methods by a substantial margin, both quantitatively and qualitatively.

Paper link: https://arxiv.org/abs/2306.05422
Project link: https://omnimotion.github.io/

A detailed unofficial overview of the paper: https://artgor.medium.com/paper-review-tracking-everything-everywhere-all-at-once-27caa13918bcn

#deeplearning #cv #motionestimation

Читать полностью…

Data Science by ODS.ai 🦜

​​Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

AI-assistant agents like ChatGPT have largely depended on supervised fine-tuning and reinforcement learning from human feedback. But, this method brings its own set of challenges - high costs, potential biases, and constraints on the true potential of these AI agents. What if there was a more effective, self-sufficient way to align AI output with human intentions? Enter Self-ALIGN, a groundbreaking methodology that marries principle-driven reasoning and the generative capabilities of large language models. This promising approach takes the AI realm by storm, offering a novel way to ensure our AI models are more helpful, ethical, and reliable - all with minimal human intervention.

Self-ALIGN is a multistage process that works by generating synthetic prompts from a large language model, augmenting prompt diversity, and leveraging a concise set of human-written principles to guide AI models. When applied to the LLaMA-65b base language model, it led to the creation of a new AI assistant, Dromedary, using less than 300 lines of human annotations. Dromedary not only outshines several state-of-the-art AI systems, such as Text-Davinci-003 and Alpaca, but it does so on a variety of benchmark datasets.

Paper link: https://arxiv.org/abs/2305.03047

Code link: https://mitibmdemos.draco.res.ibm.com/dromedary

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-dromedary

#deeplearning #nlp #llm

Читать полностью…

Data Science by ODS.ai 🦜

Stack Overflow 2023 Developer Survey

Читать полностью…

Data Science by ODS.ai 🦜

​​BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

Introducing the groundbreaking Biomedical Generative Pre-trained Transformer (BiomedGPT) model, this paper revolutionizes the field of biomedicine by offering a unified and generalist approach. BiomedGPT harnesses the power of self-supervision on extensive and diverse datasets, enabling it to effortlessly handle multi-modal inputs and excel in a wide range of downstream tasks. In a series of comprehensive experiments, BiomedGPT astoundingly outperforms its predecessors, emerging as the unrivaled leader across five distinct tasks and a staggering 20 public datasets encompassing over 15 unique biomedical modalities. Its ability to deliver expansive and all-encompassing representations of biomedical data heralds a significant advancement in the field, with promising implications for improving healthcare outcomes.

Through meticulous ablation studies, the efficacy of BiomedGPT's multi-modal and multi-task pretraining approach is vividly showcased. This groundbreaking model effortlessly transfers its vast knowledge to previously unseen data, demonstrating its versatility and adaptability. The implications of this research are profound, paving the way for the development of unified and all-encompassing models for biomedicine.

Paper link: https://arxiv.org/abs/2305.17100

Code link: https://github.com/taokz/BiomedGPT

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-biomedgpt

#deeplearning #nlp #selfsupervised #gpt #biomedicine

Читать полностью…

Data Science by ODS.ai 🦜

​​The effectiveness of MAE pre-pretraining for billion-scale pretraining

Revolutionizing the current pretrain-then-finetune paradigm of computer vision, this research has introduced an innovative pre-pretraining stage. Utilizing the Masked Autoencoder (MAE) technique for model initialization, this pre-pretraining strategy scales with the size of both the model and the data. This makes it an ideal tool for training next-generation foundation models, even on the grandest scales.

The robustness of our pre-pretraining technique is demonstrated by consistent improvement in model convergence and downstream transfer performance across diverse model scales and dataset sizes. The authors measured the effectiveness of pre-pretraining on a wide array of visual recognition tasks, and the results have been promising. The ielargest model achieved unprecedented results on iNaturalist-18 (91.3%), 1-shot ImageNet-1k (62.1%), and zero-shot transfer on Food-101 (96.0%), underlining the tremendous potential of proper model initialization, even when handling web-scale pretraining with billions of images.

Paper link: https://arxiv.org/abs/2303.13496

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-mae-pretrain

#deeplearning #cv #pretraining #selfsupervisedlearning

Читать полностью…

Data Science by ODS.ai 🦜

​​QLoRA: Efficient Finetuning of Quantized LLMs

Thia paper introduces QLoRA, a novel finetuning approach that decreases memory usage significantly, while maintaining impressive performance. Imagine this - a 65 billion parameter model finetuned on a single 48GB GPU, while preserving full 16-bit task performance. This method involves backpropagating gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters, a method that opens up new frontiers in machine learning. The icing on the cake is their high-performing model family, Guanaco, which trumps all previously released models on the Vicuna benchmark, achieving a staggering 99.3% of the performance level of ChatGPT with just 24 hours of finetuning on a single GPU.

The study also unveils several innovative techniques to conserve memory without compromising performance. These include 4-bit NormalFloat (NF4), an innovative data type that is theoretically optimal for normally distributed weights, double quantization for average memory footprint reduction, and paged optimizers to handle memory spikes. The QLoRA approach was applied to finetune more than 1000 models, leading to a detailed analysis of instruction following and chatbot performance across various model types and scales. The results affirm that QLoRA finetuning on a small, high-quality dataset yields state-of-the-art results, even with smaller models than previously used. A notable finding is that GPT-4 evaluations offer a cost-effective alternative to human evaluation. All models and code, including CUDA kernels for 4-bit training, have been released by the researchers.

Paper link: https://arxiv.org/abs/2305.14314
Code link: https://github.com/artidoro/qlora
CUDA kernels link: https://github.com/TimDettmers/bitsandbytes

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-qlora
#deeplearning #nlp #llm #quantization

Читать полностью…

Data Science by ODS.ai 🦜

​​MMS: Scaling Speech Technology to 1000+ languages

Get ready for a breakthrough in speech technology that is set to revolutionize the world of communication! The field, which has so far been restricted to around a hundred languages, barely scratches the surface of the more than 7,000 languages spoken globally. The Massively Multilingual Speech (MMS) project is taking a monumental leap to bridge this gap, increasing the number of supported languages by an astounding 10 to 40 times, depending on the task. This unprecedented expansion will be a game-changer, significantly improving global access to information and creating a more inclusive digital landscape.

This incredible feat is achieved through the creation of a new dataset drawn from publicly available religious texts and the strategic implementation of self-supervised learning. The MMS project's achievements are staggering, including the development of pre-trained wav2vec 2.0 models for 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for as many languages, and a language identification model for a whopping 4,017 languages. Even more impressive is the significant improvement in accuracy - our multilingual speech recognition model more than halves the word error rate of Whisper on 54 languages of the FLEURS benchmark, despite being trained on a significantly smaller dataset.

Paper link: https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/
Blogpost link: https://ai.facebook.com/blog/multilingual-model-speech-recognition/
Code link: https://github.com/facebookresearch/fairseq/tree/main/examples/mms

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-mms
#deeplearning #speechrecognition #tts #audio

Читать полностью…
Subscribe to a channel