Models AI Projects
Daily ranking page for Models open-source AI repositories.
Models tracks 1509 repositories with 5609029 total GitHub stars.
- google-research/tabfm - (969 stars, Python, Models)
- google-research/timesfm - TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. (26450 stars, Python, Models)
- zai-org/GLM-5 - GLM-5: From Vibe Coding to Agentic Engineering (6121 stars, Unknown, Models)
- NVIDIA/cosmos - NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, sma... (10842 stars, Jupyter Notebook, Models)
- OpenBMB/VoxCPM - VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning (32385 stars, Python, Models)
- openai/whisper - Robust Speech Recognition via Large-Scale Weak Supervision (104074 stars, Python, Models)
- deepseek-ai/DeepSeek-LLM - DeepSeek LLM: Let there be answers (7098 stars, Makefile, Models)
- ultralytics/ultralytics - Ultralytics YOLO26, YOLO11, YOLOv8 — object detection, instance segmentation, semantic segmentation, image classification, pose estimation, object tracking (59079 stars, Python, Models)
- shiyu-coder/Kronos - Kronos: A Foundation Model for the Language of Financial Markets (31721 stars, Python, Models)
- netease-youdao/Confucius4-TTS - Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine (579 stars, Python, Models)
- kairos-agi/kairos - Official code for world model Kairos (1270 stars, Python, Models)
- facebookresearch/brain2qwerty - Non-invasive decoding of typed sentences from MEG and EEG brain recordings using a convolutional encoder, transformer, and character-level language model. (669 stars, Python, Models)
- NVlabs/Eagle - Eagle: Frontier Vision-Language Models with Data-Centric Strategies (3048 stars, Python, Models)
- QwenLM/Qwen-AgentWorld - Qwen-AgentWorld: Language World Models for General Agents (735 stars, Python, Models)
- OpenMOSS/MOSS-TTS-Nano - MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed... (3838 stars, Python, Models)
- FunAudioLLM/CosyVoice - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. (21954 stars, Python, Models)
- facebookresearch/vggt-omega - [CVPR 2026 Oral] VGGT Omega (3331 stars, Python, Models)
- xinntao/Real-ESRGAN - Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration. (36012 stars, Python, Models)
- modelscope/FunASR - Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. (18821 stars, Python, Models)
- k2-fsa/sherpa-onnx - Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Intern... (13353 stars, C++, Models)
- facebookresearch/sam3 - The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained mode... (10802 stars, Python, Models)
- SakanaAI/fugu - (704 stars, Shell, Models)
- index-tts/index-tts - An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System (21618 stars, Python, Models)
- supertone-inc/supertonic - Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX. (12868 stars, Swift, Models)
- fishaudio/fish-speech - SOTA Open Source TTS (31097 stars, Python, Models)
- Wan-Video/Wan2.2 - Wan: Open and Advanced Large-Scale Video Generative Models (16506 stars, Python, Models)
- PriorLabs/TabPFN - ⚡ TabPFN: Foundation Model for Tabular Data ⚡ (7503 stars, Python, Models)
- MeiGen-AI/InfiniteTalk - Unlimited-length talking video generation that supports image-to-video and video-to-video generation (7318 stars, Python, Models)
- QwenLM/Qwen3-TTS - Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech genera... (12261 stars, Python, Models)
- rhasspy/piper - A fast, local neural text to speech system (11185 stars, C++, Models)
- FunAudioLLM/SenseVoice - Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive. (8757 stars, C, Models)
- microsoft/BitNet - Official inference framework for 1-bit LLMs (39594 stars, Python, Models)
- 2noise/ChatTTS - A generative speech model for daily dialogue. (39549 stars, Python, Models)
- lucas-maes/le-wm - Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels (4037 stars, Python, Models)
- NVlabs/GR00T-WholeBodyControl - Welcome to GR00T Whole-Body Control (WBC)! This is a unified platform for developing and deploying advanced humanoid controllers. This includes: Decoupl... (2788 stars, Python, Models)
- kepengxu/PGTFormer - [IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer (624 stars, Python, Models)
- AMAP-ML/FluxText - Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing" (588 stars, Python, Models)
- ultralytics/yolov5 - Ultralytics YOLOv5 in PyTorch > ONNX > CoreML > TFLite (57616 stars, Python, Models)
- TencentARC/GFPGAN - GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration. (37506 stars, Python, Models)
- kyegomez/OpenMythos - A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature. (14578 stars, Python, Models)
- facebookresearch/vggt - [CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer (13656 stars, Python, Models)
- ByteDance-Seed/Depth-Anything-3 - Depth Anything 3 (5690 stars, Python, Models)
- WeiboAI/VibeThinker - Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B (1448 stars, Python, Models)
- bytedance/Bernini - Bernini is a unified framework for video generation and editing that combines an MLLM-based semantic planner with a DiT-based renderer. (1014 stars, Python, Models)
- datalab-to/lift - Extract structured data from documents quickly and accurately. (707 stars, Python, Models)
- facebookresearch/segment-anything - The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and exampl... (54455 stars, Jupyter Notebook, Models)
- openai/CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image (33910 stars, Jupyter Notebook, Models)
- facebookresearch/sam2 - The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints,... (19465 stars, Jupyter Notebook, Models)
- microsoft/fara - Fara-7B: An Efficient Agentic Model for Computer Use (5968 stars, Python, Models)
- dreamzero0/dreamzero - Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals (2385 stars, Python, Models)
- sapientinc/HRM-Text - HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning. (1611 stars, Python, Models)
- meta-llama/llama - Inference code for Llama models (59495 stars, Python, Models)
- deepinsight/insightface - State-of-the-art 2D and 3D Face Analysis Project (29141 stars, Python, Models)
- lucidrains/vit-pytorch - Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch (25396 stars, Python, Models)
- deepseek-ai/DeepSeek-Coder - DeepSeek Coder: Let the Code Write Itself (23803 stars, Python, Models)
- QwenLM/Qwen - The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. (21380 stars, Python, Models)
- kyutai-labs/moshi - Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. (10506 stars, Python, Models)
- IDEA-Research/GroundingDINO - [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection" (10353 stars, Python, Models)
- NVIDIA/personaplex - PersonaPlex code. (10134 stars, Python, Models)
- Lightricks/LTX-2 - Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model. (8058 stars, Python, Models)