Technical Articles
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Large Language Models for Data Annotation and Synthesis: A Survey
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Genie: Generative Interactive Environments
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
CARTE: Pretraining and Transfer for Tabular Learning
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Sora Generates Videos with Stunning Geometrical Consistency
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Learning and Leveraging World Models in Visual Representation Learning
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
SynCode: LLM Generation with Grammar Augmentation
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Hidden Attention of Mamba Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Training-Free Pretrained Model Merging
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Evolution Transformer: In-Context Evolutionary Optimization
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Enhancing Vision-Language Pre-training with Rich Supervisions
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Backtracing: Retrieving the Cause of the Query
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Learning to Decode Collaboratively with Multiple Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
SaulLM-7B: A pioneering Large Language Model for Law
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
MedMamba: Vision Mamba for Medical Image Classification
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
How Far Are We from Intelligent Visual Deductive Reasoning?
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Common 7B Language Models Already Possess Strong Math Capabilities
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Is Cosine-Similarity of Embeddings Really About Similarity?
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LLM4Decompile: Decompiling Binary Code with Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Algorithmic Progress in Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Stealing Part of a Production Language Model
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Chronos: Learning the Language of Time Series
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Language models scale reliably with over-training and on downstream tasks
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LocalMamba: Visual State Space Model with Windowed Selective Scan
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
RAFT: Adapting Language Model to Domain Specific RAG
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
TnT-LLM: Text Mining at Scale with Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Parameter Efficient Reinforcement Learning from Human Feedback
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Evaluating Reward Models for Language Modeling
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LlaMaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
RakutenAI-7B: Extending Large Language Models for Japanese
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Can Large Language Models Explore In-Context?
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
AIOS: LLM Agent Operating System
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Unreasonable Ineffectiveness of the Deeper Layers
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
ViTAR: Vision Transformer with Any Resolution
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Long-form factuality in large language models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Mechanistic Design and Scaling of Hybrid Architectures
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
MagicLens : Self-Supervised Image Retrieval with Open-Ended Instructions
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Model Stock: All we need is just a few fine-tuned models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Do Language Models Plan Ahead for Future Tokens?
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Fine Line: Navigating Large Language Model Pretraining with Downstreaming Capability Analysis
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LongICLBench: Long-context LLMs Struggle with Long In-context Learning
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Emergent Abilities in Reduced-Scale Generative Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
On the Scalability of Diffusion-based Text-to-Image Generation
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Faster Diffusion via Temporal Attention Decomposition
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Training LLMs over Neurally Compressed Text
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
ReFT: Representation Finetuning for Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Sigma : Siamese Mamba Network for Multi-Modal Semantic Segmentation
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
AutoCodeRover: Autonomous Program Improvement
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
CodecLM: Aligning Language Models with Tailored Synthetic Data
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Adapting LLaMA Decoder to Vision Transformer
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LLoCO: Learning Long Contexts Offline
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Best Practices and Lessons Learned on Synthetic Data
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
RHO-1: Not All Tokens Are What You Need
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Inheritune: Training Smaller Yet More Attentive Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Dataset Reset Policy Optimization for RLHF
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LLM In-Context Recall is Prompt Dependent
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Chinchilla Scaling: A replication attempt
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Learn Your Reference Model for Real Good Alignment
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
ClashEval: Quantifying the tug-of-war between an LLM?s internal prior and external evidence
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Survey of Retrieval-Augmented Text Generation in Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
OpenBeZoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
NExT: Teaching Large Language Models to Reason about Code Execution
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Graph Machine Learning in the Era of Large Language Models (LLMs)
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Retrieval Head Mechanistically Explains Long-Context Factuality
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Make Your LLM Fully Utilize the Context
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Better & Faster Large Language Models via Multi-token Prediction
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
A Primer on the Inner Workings of Transformer-Based Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
KAN: Kolmogorov?Arnold Networks
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Is Bigger Edit Batch Size Always Better? - An Empirical Study on Model Editing with Llama-3
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Self-Play Preference Optimization for Language Model Alignment
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
What matters when building vision-language models?
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models
xLSTM: Extended Long Short-Term Memory
Artificial Intelligence , Machine Learning , Natural Language Processing , Large Language Models