Technical Articles

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang , I-Hau Yeh , Hong-Yuan Mark Liao

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Yiran Ding , Li Lyna Zhang , Chengruidong Zhang , Yuanyuan Xu , Ning Shang , Jiahang Xu , Fan Yang , Mao Yang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Large Language Models for Data Annotation and Synthesis: A Survey

Zhen Tan , Dawei Li , Song Wang , Alimohammad Beigi , Bohan Jiang , Amrita Bhattacharjee , Mansooreh Karami , Jundong Li , Lu Cheng , Huan Liu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Baichuan Zhou , Ying Hu , Xi Weng , Junlong Jia , Jie Luo , Xien Liu , Ji Wu , Lei Huang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian , Chris Cremer , Matthias Gallé , Marzieh Fadaee , Julia Kreutzer , Olivier Pietquin , Ahmet Üstün , Sara Hooker

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Genie: Generative Interactive Environments

Jake Bruce , Michael Dennis , Ashley Edwards , Jack Parker-Holder , Yuge (Jimmy) Shi , Edward Hughes , Matthew Lai , Aditi Mavalankar , Richie Steigerwald , Chris Apps , Yusuf Aytar , Sarah Bechtle , Feryal Behbahani , Stephanie Chan , Nicolas Heess , Lucy Gonzalez , Simon Osindero , Sherjil Ozair , Scott Reed , Jingwei Zhang , Konrad Zolna , Jeff Clune , Nando de Freitas , Satinder Singh , Tim Rocktäschel

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

CARTE: Pretraining and Transfer for Tabular Learning

Myung Jun Kim , Léo Grinsztajn , Gaël Varoquaux

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Shuming Ma , Hongyu Wang , Lingxiao Ma , Lei Wang , Wenhui Wang , Shaohan Huang , Li Dong , Ruiping Wang , Jilong Xue , Furu Wei

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Sora Generates Videos with Stunning Geometrical Consistency

Xuanyi Li , Daquan Zhou , Chenxu Zhang , Shaodong Wei , Qibin Hou , Ming-Ming Cheng

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Biao Zhang , Zhongtao Liu , Colin Cherry , Orhan Firat

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Soham De , Samuel L. Smith , Anushan Fernando , Aleksandar Botev , George Cristian-Muraru , Albert Gu , Ruba Haroun , Leonard Berrada , Yutian Chen , Srivatsan Srinivasan , Guillaume Desjardins , Arnaud Doucet , David Budden , Yee Whye Teh , Razvan Pascanu , Nando De Freitas , Caglar Gulcehre

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Learning and Leveraging World Models in Visual Representation Learning

Quentin Garrido , Mahmoud Assran , Nicolas Ballas , Adrien Bardes , Laurent Najman , Yann LeCun

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

SynCode: LLM Generation with Grammar Augmentation

Shubham Ugare , Tarun Suresh , Hangoo Kang , Sasa Misailovic , Gagandeep Singh

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Hidden Attention of Mamba Models

Ameen Ali , Itamar Zimerman , Lior Wolf

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Training-Free Pretrained Model Merging

Zhengqi Xu , Ke Yuan , Huiqiong Wang , Yong Wang , Mingli Song , Jie Song

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Yuchen Duan , Weiyun Wang , Zhe Chen , Xizhou Zhu , Lewei Lu , Tong Lu , Yu Qiao , Hongsheng Li , Jifeng Dai , Wenhai Wang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Nathaniel Li , Alexander Pan , Anjali Gopal , Summer Yue , Daniel Berrios , Alice Gatti , Justin D. Li , Ann-Kathrin Dombrowski , Shashwat Goel , Long Phan , Gabriel Mukobi , Nathan Helm-Burger , Rassin Lababidi , Lennart Justen , Andrew B. Liu , Michael Chen , Isabelle Barrass , Oliver Zhang , Xiaoyuan Zhu , Rishub Tamirisa , Bhrugu Bharathi , Adam Khoja , Zhenqi Zhao , Ariel Herbert-Voss , Cort B. Breuer , Samuel Marks , Oam Patel , Andy Zou , Mantas Mazeika , Zifan Wang , Palash Oswal , Weiran Lin , Adam A. Hunt , Justin Tienken-Harder , Kevin Y. Shih , Kemper Talley , John Guan , Russell Kaplan , Ian Steneker , David Campbell , Brad Jokubaitis , Alex Levinson , Jean Wang , William Qian , Kallol Krishna Karmakar , Steven Basart , Stephen Fitz , Mindy Levine , Ponnurangam Kumaraguru , Uday Tupakula , Vijay Varadharajan , Ruoyu Wang , Yan Shoshitaishvili , Jimmy Ba , Kevin M. Esvelt , Alexandr Wang , Dan Hendrycks

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Evolution Transformer: In-Context Evolutionary Optimization

Robert Tjarko Lange , Yingtao Tian , Yujin Tang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Enhancing Vision-Language Pre-training with Rich Supervisions

Yuan Gao , Kunyu Shi , Pengkai Zhu , Edouard Belval , Oren Nuriel , Srikar Appalaraju , Shabnam Ghadar , Vijay Mahadevan , Zhuowen Tu , Stefano Soatto

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser , Sumith Kulal , Andreas Blattmann , Rahim Entezari , Jonas Muller , Harry Saini , Yam Levi , Dominik Lorenz , Axel Sauer , Frederic Boesel , Dustin Podell , Tim Dockhorn , Zion English , Kyle Lacey , Alex Goodwin , Yannik Marek , Robin Rombach

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Xin Men , Mingyu Xu , Qingyu Zhang , Bingning Wang , Hongyu Lin , Yaojie Lu , Xianpei Han , Weipeng Chen

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Backtracing: Retrieving the Cause of the Query

Rose E. Wang , Pawan Wirawarn , Omar Khattab , Noah Goodman , Dorottya Demszky

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Learning to Decode Collaboratively with Multiple Language Models

Shannon Zejiang Shen , Hunter Lang , Bailin Wang , Yoon Kim , David Sontag

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

SaulLM-7B: A pioneering Large Language Model for Law

Pierre Colombo , Telmo Pessoa Pires , Malik Boudiaf , Dominic Culver , Rui Melo , Caio Corro , André F. T. Martins , Fabrizio Esposito , Vera Lúcia Raposo , Sofia Morgado , Michael Desa

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

Deepanway Ghosal , Vernon Toh Yan Han , Chia Yew Ken , Soujanya Poria

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze , Gu Zhang , Kangning Zhang , Chenyuan Hu , Muhan Wang , Huazhe Xu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

MedMamba: Vision Mamba for Medical Image Classification

Yubiao Yue , Zhenzhang Li

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Jiawei Zhao , Zhenyu Zhang , Beidi Chen , Zhangyang Wang , Anima Anandkumar , Yuandong Tian

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother , Jordi Orbay , Quan Vuong , Adrien Ali Taïga , Yevgen Chebotar , Ted Xiao , Alex Irpan , Sergey Levine , Pablo Samuel Castro , Aleksandra Faust , Aviral Kumar , Rishabh Agarwal

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

How Far Are We from Intelligent Visual Deductive Reasoning?

Yizhe Zhang , He Bai , Ruixiang Zhang , Jiatao Gu , Shuangfei Zhai , Josh Susskind , Navdeep Jaitly

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Common 7B Language Models Already Possess Strong Math Capabilities

Chen Li , Weiqi Wang , Jingcheng Hu , Yixuan Wei , Nanning Zheng , Han Hu , Zheng Zhang , Houwen Peng

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Is Cosine-Similarity of Embeddings Really About Similarity?

Harald Steck , Chaitanya Ekanadham , Nathan Kallus

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LLM4Decompile: Decompiling Binary Code with Large Language Models

Hanzhuo Tan , Qi Luo , Jing Li , Yuqun Zhang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Algorithmic Progress in Language Models

Anson Ho , Tamay Besirog , Ege Erdil , David Owen , Robi Rahman , Zifan Carl Guo , David Atkinson , Neil Thompson , Jaime Sevilla

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Stealing Part of a Production Language Model

Nicholas Carlini , Daniel Paleka , Krishnamurthy (Dj) Dvijotham , Thomas Steinke , Jonathan Hayase , A. Feder Cooper , Katherine Lee , Matthew Jagielski , Milad Nasr , Arthur Conmy , Itay Yona , Eric Wallace , David Rolnick , Florian Tramèr

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari , Lorenzo Stella , Caner Turkmen , Xiyuan Zhang , Pedro Mercado , Huibin Shen , Oleksandr Shchur , Syama Sundar Rangapuram , Sebastian Pineda Arango , Shubham Kapoor , Jasper Zschiegner , Danielle C. Maddix , Hao Wang , Michael W. Mahoney , Kari Torkkola , Andrew Gordon Wilson , Michael Bohlke-Schneider , Yuyang Wang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim , Benjamin Thérien , Kshitij Gupta , Mats L. Richter , Quentin Anthony , Timothée Lesort , Eugene Belilovsky , Irina Rish

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Language models scale reliably with over-training and on downstream tasks

Samir Yitzhak Gadre , Georgios Smyrnis , Vaishaal Shankar , Suchin Gururangan , Mitchell Wortsman , Rulin Shao , Jean Mercat , Alex Fang , Jeffrey Li , Sedrick Keh , Rui Xin , Marianna Nezhurina , Igor Vasiljevic , Jenia Jitsev , Luca Soldaini , Alexandros G. Dimakis , Gabriel Ilharco , Pang Wei Koh , Shuran Song , Thomas Kollarv , Yair Carmon , Achal Dave , Reinhard Heckel , Niklas Muennighoff , Ludwig Schmidt

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Ao Sun , Weilin Zhao , Xu Han , Cheng Yang , Zhiyuan Liu , Chuan Shi , Maosong Sun

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LocalMamba: Visual State Space Model with Windowed Selective Scan

Tao Huang , Xiaohuan Pei , Shan You , Fei Wang , Chen Qian , Chang Xu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Haiyang Wang , Hao Tang , Li Jiang , Muhammad Ferjad Naeem , Hongsheng Li , Bernt Schiele , Liwei Wang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie , Zhe Gan , Jean-Philippe Fauconnier , Sam Dodge , Bowen Zhang , Philipp Dufter , Dhruti Shah , Xianzhi Du , Futang Peng , Floris Weers , Anton Belyi , Haotian Zhang , Karanjeet Singh , Doug Kang , Ankur Jain , Hongyu Hè , Max Schwarzer , Tom Gunter , Xiang Kong , Aonan Zhang , Jianyu Wang , Chong Wang , Nan Du , Tao Lei , Sam Wiseman , Guoli Yin , Mark Lee , Zirui Wang , Ruoming Pang , Peter Grasch , Alexander Toshev , Yinfei Yang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

RAFT: Adapting Language Model to Domain Specific RAG

Tianjun Zhang , Shishir G. Patil , Naman Jain , Sheng Shen , Matei Zaharia , Ion Stoica , Joseph E. Gonzalez

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

TnT-LLM: Text Mining at Scale with Large Language Models

Mengting Wan , Tara Safavi , Sujay Kumar Jauhar , Yujin Kim , Scott Counts , Jennifer Neville , Siddharth Suri , Chirag Shah , Ryen W. White , Longqi Yang , Reid Andersen , Georg Buscher , Dhruv Joshi , Nagu Rangan

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong , Jinhao Duan , Chenhui Zhang , Zhangheng Li , Chulin Xie , Kelsey Lieberman , James Diffenderfer , Brian Bartoldson , Ajay Jaiswal , Kaidi Xu , Bhavya Kailkhura , Dan Hendrycks , Dawn Song , Zhangyang Wang , Bo Li

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Parameter Efficient Reinforcement Learning from Human Feedback

Hakim Sidahmed , Samrat Phatale , Alex Hutcheson , Zhuonan Lin , Zhang Chen , Zac Yu , Jarvis Jin , Simral Chaudhary , Roman Komarytsia , Christiane Ahlheim , Yonghao Zhu , Bowen Li , Saravanan Ganesh , Bill Byrne , Jessica Hoffmann , Hassan Mansoor , Wei Li , Abhinav Rastogi , Lucas Dixon

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Evaluating Reward Models for Language Modeling

Nathan Lambert , Valentina Pyatkin , Jacob Morrison , LJ Miranda , Bill Yuchen Lin , Khyathi Chandu , Nouha Dziri , Sachin Kumar , Tom Zick? , Yejin Choi , Noah A. Smith , Hannaneh Hajishirzi

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LlaMaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Yaowei Zheng , Richong Zhang , Junhao Zhang , Yanhan Ye , Zheyan Luo , Zhangchi Feng , Yongqiang Ma

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

RakutenAI-7B: Extending Large Language Models for Japanese

Aaron Levine , Connie Huang , Chenguang Wang , Eduardo Batista , Ewa Szymanska , Hongyi Ding , Hou Wei Chou , Jean-François Pessiot , Johanes Effendi , Justin Chiu , Kai Torben Ohlhus , Karan Chopra , Keiji Shinzato , Koji Murakami , Lee Xiong , Lei Chen , Maki Kubota , Maksim Tkachenko , Miroku Lee , Naoki Takahashi , Prathyusha Jwalapuram , Ryutaro Tatsushima , Saurabh Jain , Sunil Kumar Yadav , Ting Cai , Wei-Te Chen , Yandi Xia , Yuki Nakayama , Yutaka Higashiyama

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series

Badri N. Patro , Vijay S, Agneeswaran

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Can Large Language Models Explore In-Context?

Akshay Krishnamurthy , Keegan Harris , Dylan J. Foster , Cyril Zhang , Aleksandrs Slivkins

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Nicholas Lee , Thanakul Wattanawong , Sehoon Kim , Karttikeya Mangalam , Sheng Shen , Gopala Anumanchipalli , Michael W. Mahoney , Kurt Keutzer , Amir Gholami

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

AIOS: LLM Agent Operating System

Kai Mei , Xi Zhu , Wujiang Xu , Wenyue Hua , Mingyu Jin , Zelong Li , Shuyuan Xu , Ruosong Ye , Yingqiang Ge , Yongfeng Zhang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov , Kushal Tirumala , Hassan Shapourian , Paolo Glorioso , Daniel A. Roberts

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

Elliot Bolton , Abhinav Venigalla , Michihiro Yasunaga , David Hall , Betty Xiong , Tony Lee , Roxana Daneshjou , Jonathan Frankle , Percy Liang , Michael Carbin , Christopher D. Manning

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

ViTAR: Vision Transformer with Any Resolution

Qihang Fan , Quanzeng You , Xiaotian Han , Yongfei Liu , Yunzhe Tao , Huaibo Huang , Ran He , Hongxia Yang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Long-form factuality in large language models

Jerry Wei , Chengrun Yang , Xinying Song , Yifeng Lu , Nathan Hu , Jie Huang , Dustin Tran , Daiyi Peng , Ruibo Liu , Da Huang , Cosmo Du , Quoc V. Le

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Yanwei Li , Yuechen Zhang , Chengyao Wang , Zhisheng Zhong , Yixin Chen , Ruihang Chu , Shaoteng Liu , Jiaya Jia

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

Rui Pan , Xiang Liu , Shizhe Diao , Renjie Pi , Jipeng Zhang , Chi Han , Tong Zhang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Mechanistic Design and Scaling of Hybrid Architectures

Michael Poli , Armin W Thomas , Eric Nguyen , Pragaash Ponnusamy , Björn Deiseroth , Kristian Kersting , Taiji Suzuki , Brian Hie , Stefano Ermon , Christopher Ré , Ce Zhang , Stefano Massaroli

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

MagicLens : Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang , Yi Luan , Hexiang Hu , Kenton Lee , Siyuan Qiao , Wenhu Chen , Yu Su , Ming-Wei Chang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang , Sangdoo Yun , Dongyoon Han

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Do Language Models Plan Ahead for Future Tokens?

Wilson Wu , John X. Morris , Lionel Levine

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Kangfu Mei , Zhengzhong Tu , Mauricio Delbracio , Hossein Talebi , Vishal M. Patel , Peyman Milanfar

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Fine Line: Navigating Large Language Model Pretraining with Downstreaming Capability Analysis

Chen Yang , Junzhuo Li , Xinyao Niu , Xinrun Du , Songyang Gao , Haoran Zhang , Zhaoliang Chen , Xingwei Qu , Ruibin Yuan , Yizhi Li , Jiaheng Liu , Stephen W. Huang , Shawn Yue , Ge Zhang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Zhengcong Fei , Mingyuan Fan , Changqian Yu , Debang Li , Junshi Huang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

David Raposo , Sam Ritter , Blake Richards , Timothy Lillicrap , Peter Conway Humphreys , Adam Santoro

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LongICLBench: Long-context LLMs Struggle with Long In-context Learning

Tianle Li , Ge Zhang , Quy Duc Do , Xiang Yue , Wenhu Chen

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Emergent Abilities in Reduced-Scale Generative Language Models

Sherin Muckatira , Vijeta Deshpande , Vladislav Lialin , Anna Rumshisky

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Maksym Andriushchenko , Francesco Croce , Nicolas Flammarion

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

On the Scalability of Diffusion-based Text-to-Image Generation

Hao Li , Yang Zou , Ying Wang , Orchid Majumder , Yusheng Xie , R. Manmatha , Ashwin Swaminathan , Zhuowen Tu , Stefano Ermon , Stefano Soatto

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Qijun Luo , Hengxu Yu , Xiao Li

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Faster Diffusion via Temporal Attention Decomposition

Haozhe Liu , Wentian Zhang , Jinheng Xie , Francesco Faccio , Mengmeng Xu , Tao Xiang , Mike Zheng Shou , Juan-Manuel Perez-Rua , Jürgen Schmidhuber

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Training LLMs over Neurally Compressed Text

Brian Lestera , Jaehoon Leeb , Alex Alemia , Jeffrey Penningtona , Adam Robertsa , Jascha Sohl-Dicksteinb , Noah Constanta

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Makesh Sreedhar , Traian Rebedea , Shaona Ghosh , Jiaqi Zeng , Christopher Parisien

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

ReFT: Representation Finetuning for Language Models

Zhengxuan Wu , Aryaman Arora , Zheng Wang , Atticus Geiger , Dan Jurafsky , Christopher D. Manning , Christopher Potts

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Jingyu Zhang , Marc Marone , Tianjian Li , Benjamin Van Durme , Daniel Khashabi

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Sigma : Siamese Mamba Network for Multi-Modal Semantic Segmentation

Zifu Wan , Pingping Zhang , Yuhao Wang , Silong Yong , Simon Stepputtis , Katia Sycara , Yaqi Xie

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

AutoCodeRover: Autonomous Program Improvement

Yuntong Zhang , Haifeng Ruan , Zhiyu Fan , Abhik Roychoudhury

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Bo Peng , Daniel Goldstein , Quentin Anthony , Alon Albalak , Eric Alcaide , Stella Biderman , Eugene Cheah , Xingjian Du , Teddy Ferdinan , Haowen Hou , Przemysław Kazienko , Kranthi Kiran GV , Jan Kocoń , Bartłomiej Koptyra , Satyapriya Krishna , Ronald McClelland Jr. , Jiaju Lin , Niklas Muennighoff , Fares Obeid , Atsushi Saito , Guangyu Song , Haoqin Tu , Cahya Wirawan , Stanis?aw Woźniak , Ruichong Zhang , Bingchen Zhao , Qihang Zhao , Peng Zhou , Jian Zhu , Rui-Jie Zhu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

CodecLM: Aligning Language Models with Tailored Synthetic Data

Zifeng Wang , Chun-Liang Li , Vincent Perot , Long T. Le , Jin Miao , Zizhao Zhang , Chen-Yu Lee , Tomas Pfister

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Shengding Hu , Yuge Tu , Xu Han , Chaoqun He , Ganqu Cui , Xiang Long , Zhi Zheng , Yewei Fang , Yuxiang Huang , Weilin Zhao , Xinrong Zhang , Zheng Leng Thai , Kaihuo Zhang , Chongyi Wang , Yuan Yao , Chenyang Zhao , Jie Zhou , Jie Cai , Zhongwu Zhai , Ning Ding , Chao Jia , Guoyang Zeng , Dahai Li , Zhiyuan Liu , Maosong Sun

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models

Sebastian Bordt , Harsha Nori , Vanessa Rodrigues , Besmira Nushi , Rich Caruana

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader , Vaibhav Adlakha , Marius Mosbach , Dzmitry Bahdanau , Nicolas Chapados , Siva Reddy

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Adapting LLaMA Decoder to Vision Transformer

Jiahao Wang , Wenqi Shao , Mengzhao Chen , Chengyue Wu , Yong Liu , Taiqiang Wu , Kaipeng Zhang , Songyang Zhang , Kai Chen , Ping Luo

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Tsendsuren Munkhdalai , Manaal Faruqui , Siddharth Gopal

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LLoCO: Learning Long Contexts Offline

Sijun Tan , Xiuyu Li , Shishir Patil , Ziyang Wu , Tianjun Zhang , Kurt Keutzer , Joeseph E. Gonzalez , Raluca Ada Popa

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Yikang Shen , Zhen Guo , Tianle Cai , Zengyi Qin

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Best Practices and Lessons Learned on Synthetic Data

Ruibo Liu , Jerry Wei , Fangyu Liu , Chenglei Si , Yanzhe Zhang , Jinmeng Rao , Steven Zheng , Daiyi Peng , Diyi Yang , Denny Zhou , Andrew M. Dai

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

RHO-1: Not All Tokens Are What You Need

Zhenghao Lin , Zhibin Gou , Yeyun Gong , Xiao Liu , Yelong Shen , Ruochen Xu , Chen Lin , Yujiu Yang , Jian Jiao , Nan Duan , Weizhu Chen

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Inheritune: Training Smaller Yet More Attentive Language Models

Sunny Sanyal , Ravid Shwartz-Ziv , Alex Dimakis , Sujay Sanghavi

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Dataset Reset Policy Optimization for RLHF

Jonathan D. Chang , Wenhao Zhan , Owen Oertell , Kianté Brantley , Dipendra Misra , Jason D. Lee , Wen Sun

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LLM In-Context Recall is Prompt Dependent

Daniel Machlab , Rick Battle

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Xiao Wang , Shiao Wang , Yuhe Ding , Yuehang Li , Wentao Wu , Yao Rong , Weizhe Kong , Ju Huang , Shihao Li , Haoxiang Yang , Ziwen Wang , Bo Jiang , Chenglong Li , Yaowei Wang , Yonghong Tian , Jin Tang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Chinchilla Scaling: A replication attempt

Tamay Besiroglu , Ege Erdil , Matthew Barnett , Josh You

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Learn Your Reference Model for Real Good Alignment

Alexey Gorbatovski , Boris Shaposhnikov , Alexey Malakhov , Nikita Surnachev , Yaroslav Aksenov , Ian Maksimov , Nikita Balagansky , Daniil Gavrilov

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Shusheng Xu , Wei Fu , Jiaxuan Gao , Wenjie Ye , Weilin Liu , Zhiyu Mei , Guangju Wang , Chao Yu , Yi Wu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Zichao Li , Cihang Xie , Ekin Dogus

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

ClashEval: Quantifying the tug-of-war between an LLM?s internal prior and external evidence

Kevin Wu , Eric Wu , James Zou

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Survey of Retrieval-Augmented Text Generation in Large Language Models

Yizheng Huang , Jimmy X. Huang

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

Asaf Yehudai , Elron Bandel

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Ye Tian , Baolin Peng , Linfeng Song , Lifeng Jin , Dian Yu , Lei Han , Haitao Mi1 , Dong Yu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

OpenBeZoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Chandeepa Dissanayake , Lahiru Lowe , Sachith Gunasekara , Yasiru Ratnayake

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace , Kai Xiao , Reimar Leike , Lilian Weng , Johannes Heidecke , Alex Beutel

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs

Wei Huang , Xingyu Zheng , Xudong Ma , Haotong Qin , Chengtao Lv , Hong Chen , Jie Luo , Xiaojuan Qi , Xianglong Liu , Michele Magno

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Sachin Mehta , Mohammad Hossein Sekhavat , Qingqing Cao , Maxwell Horton , Yanzi Jin , Chenfan Sun , Iman Mirzadeh , Mahyar Najibi , Dmitry Belenko , Peter Zatloukal , Mohammad Rastegari

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

NExT: Teaching Large Language Models to Reason about Code Execution

Ansong Ni , Miltiadis Allamanis , Arman Cohan , Yinlin Deng , Kensen Shi , Charles Sutton , Pengcheng Yin

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Multi-Head Mixture-of-Experts

Xun Wu , Shaohan Huang , Wenhui Wang , Furu Wei

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Graph Machine Learning in the Era of Large Language Models (LLMs)

Wenqi Fan , Shijie Wang , Jiani Huang , Zhikai Chen , Yu Song , Wenzhuo Tang , Haitao Mao , Hui Liu , Xiaorui Liu , Dawei Yin , Qing Li

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Retrieval Head Mechanistically Explains Long-Context Factuality

Wenhao Wu , Yizhong Wang , Guangxuan Xiao , Hao Peng , Yao Fu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Mostafa Elhoushi , Akshat Shrivastava , Diana Liskovich , Basil Hosmer , Bram Wasti , Liangzhen Lai , Anas Mahmoud , Bilge Acun , Saurabh Agrawal , Ahmed Roman , Ahmed A Aly , Beidi Chen , Carole Jean-Wu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Make Your LLM Fully Utilize the Context

Shengnan An , Zexiong Ma , Zeqi Lin? , Nanning Zheng , Jian-Guang Lou

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Justin Zhao , Timothy Wang , Wael Abid , Geoffrey Angus , Arnav Garg , Jeffery Kinnison , Alex Sherstinsky , Piero Molino , Travis Addair , Devvret Rishi

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Better & Faster Large Language Models via Multi-token Prediction

Fabian Gloeckle , Badr Youbi Idrissi , Baptiste Rozière , David Lopez-Paz , Gabriel Synnaeve

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing

Yucheng Hu , Yuxing Lu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

A Primer on the Inner Workings of Transformer-Based Language Models

Javier Ferrando , Gabriele Sarti , Arianna Bisazza , Marta R. Costa-jussà

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively

Tiziano Labrunaa , Jon Ander Camposc , Gorka Azkuned

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

KAN: Kolmogorov?Arnold Networks

Ziming Liu , Yixuan Wang , Sachin Vaidya , Fabian Ruehle , James Halverson , Marin Soljačić , Thomas Y. Hou , Max Tegmark

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3

Junsang Yoon , Akshat Gupta , Gopala Anumanchipalli

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Self-Play Preference Optimization for Language Model Alignment

Yue Wu , Zhiqing Sun , Huizhuo Yuan , Kaixuan Ji , Yiming Yang , Quanquan Gu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Hugh Zhang , Jeff Da , Dean Lee , Vaughn Robinson , Catherine Wu , Will Song , Tiffany Zhao , Pranav Raja , Charlotte Zhuang , Dylan Slack , Qin Lyu , Sean Hendryx , Russell Kaplan , Michele (Mike) Lunati , Summer Yue

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Seungone Kim , Juyoung Suk , Shayne Longpre , Bill Yuchen Lin , Jamin Shin , Sean Welleck , Graham Neubig , Moontae Lee , Kyungjae Lee , Minjoon Seo

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

What matters when building vision-language models?

Hugo Laurençon , Léo Tronchon , Matthieu Cord , Victor Sanh

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

Is Flash Attention Stable?

Alicia Golden , Samuel Hsia , Fei Sun , Bilge Acun , Basil Hosmer , Yejin Lee , Zachary DeVito , Jeff Johnson , Gu-Yeon Wei , David Brooks , Carole-Jean Wu

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Ramya Prabhu , Ajay Nayak , Jayashree Mohan , Ramchandran Ramjee , Ashish Panwar

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models

xLSTM: Extended Long Short-Term Memory

Maximilian Beck , Korbinian Pöppel , Markus Spanring , Andreas Auer , Oleksandra Prudnikova , Michael Kopp , Günter Klambauer , Johannes Brandstetter , Sepp Hochreiter

Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models