Your trial period has ended!
For full access to functionality, please pay for a premium subscription
DA
Data Science | Machine Learning with Python for Researchers
https://t.me/datasciencet
Channel age
Created
Language
English
0.59%
ER (week)
1.12%
ERR (week)

Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Messages Statistics
Reposts and citations
Publication networks
Satellites
Contacts
History
Top categories
Main categories of messages will appear here.
Top mentions
The most frequent mentions of people, organizations and places appear here.
Found 178 results
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅ https://t.me/addlist/8_rRW2scgfRhOTc0

✅ https://t.me/Codeprogrammer
04/23/2025, 15:50
t.me/datasciencet/20559
NVIDIA introduces Describe Anything Model (DAM)

a new state-of-the-art model designed to generate rich, detailed descriptions for specific regions in images and videos. Users can mark these regions using points, boxes, scribbles, or masks.
DAM sets a new benchmark in multimodal understanding, with open-source code under the Apache license, a dedicated dataset, and a live demo available on Hugging Face.

Explore more below:
Paper: https://lnkd.in/dZh82xtV
Project Page: https://lnkd.in/dcv9V2ZF
GitHub Repo: https://lnkd.in/dJB9Ehtb
Hugging Face Demo: https://lnkd.in/dXDb2MWU
Review: https://t.ly/la4JD

#NVIDIA #DescribeAnything #ComputerVision #MultimodalAI #DeepLearning #ArtificialIntelligence #MachineLearning #OpenSource #HuggingFace #GenerativeAI #VisualUnderstanding #Python #AIresearch

https://t.me/DataScienceT ✅
04/23/2025, 13:08
t.me/datasciencet/20558
Follow me on linkedin (important for you)

https://www.linkedin.com/in/hussein-sheikho-4a8187246
04/21/2025, 15:27
t.me/datasciencet/20557
Liquid: Language Models are Scalable Multi-modal Generators

5 Dec 2024 · Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai ·

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We show that existing LLMs can serve as strong foundations for Liquid, saving 100x in training costs while outperforming Chameleon in multimodal capabilities and maintaining language performance comparable to mainstream LLMs like LLAMA2. Liquid also outperforms models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. This work demonstrates that LLMs such as LLAMA3.2 and GEMMA2 are powerful multimodal generators, offering a scalable solution for enhancing both vision-language understanding and generation. The code and models will be released at https://github.com/FoundationVision/Liquid.

Paper: https://arxiv.org/pdf/2412.04332v2.pdf

Code: https://github.com/foundationvision/liquid

https://t.me/DataScienceT 🖕
04/21/2025, 12:49
t.me/datasciencet/20556
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

14 Apr 2025 · Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng ·

In this paper we tackle a fundamental question: "Can we train latent diffusion models together with the variational auto-encoder (VAE) tokenizer in an end-to-end manner?" Traditional deep-learning wisdom dictates that end-to-end training is often preferable when possible. However, for latent diffusion transformers, it is observed that end-to-end training both VAE and diffusion-model using standard diffusion-loss is ineffective, even causing a degradation in final performance. We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process. Despite its simplicity, the proposed training recipe (REPA-E) shows remarkable performance; speeding up diffusion model training by over 17x and 45x over REPA and vanilla training recipes, respectively. Interestingly, we observe that end-to-end tuning with REPA-E also improves the VAE itself; leading to improved latent space structure and downstream generation performance. In terms of final performance, our approach sets a new state-of-the-art; achieving FID of 1.26 and 1.83 with and without classifier-free guidance on ImageNet 256 x 256. Code is available at https://end2end-diffusion.github.io.

Paper: https://arxiv.org/pdf/2504.10483v1.pdf

Code: https://github.com/End2End-Diffusion/REPA-E

Dataset: ImageNet

https://t.me/DataScienceT ✅
04/20/2025, 12:39
t.me/datasciencet/20555
📢 5-Day Generative AI Intensive Course with #Google is now available as a self-paced Learn Guide!

Access whitepapers, podcasts, code labs, & recorded livestreams. Additionally, there is a bonus assignment for you!
https://www.kaggle.com/learn-guide/5-day-genai

#GenerativeAI #GoogleAI #AICourse #SelfPacedLearning #MachineLearning #DeepLearning #Kaggle #AICommunity #TechEducation #AIforEveryone

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/19/2025, 21:09
t.me/datasciencet/20554
🔥 General Attention-Based Object Detection 🔥

👉 GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

👉 Review: https://t.ly/O7wqH
👉 Paper: https://lnkd.in/dc5VTUj9
👉 Project: https://lnkd.in/dzrt-qQV

#3DObjectDetection #Monocular3D #DeepLearning #WeakSupervision #ComputerVision #AI #MachineLearning #GATE3D

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/18/2025, 10:45
t.me/datasciencet/20553
🔥ENTER VIP FOR FREE! ENTRY 24 HOURS FREE!

LISA TRADER - most successful trader for 2024. A week ago they finished a marathon in their vip channel where from $100 they made $2000, in just two weeks of time!

Entry to her channel cost : $1500 FOR 24 ENTRY FREE!

JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
JOIN THE VIP CHANNEL NOW!
04/17/2025, 15:03
t.me/datasciencet/20552
Don't forget to attend this session!
04/17/2025, 08:11
t.me/datasciencet/20551
💥 Geo4D: VideoGen 4D Scene 💥

The Oxford VGG unveils Geo4D, a breakthrough in #videodiffusion for monocular 4D reconstruction. Trained only on synthetic data, Geo4D still achieves strong generalization to real-world scenarios. It outputs point maps, depth, and ray maps, setting a new #SOTA in dynamic scene reconstruction. Code is now released!

⚡️ Review: https://t.ly/X55Uj
⚡️ Paper: https://arxiv.org/pdf/2504.07961
⚡️ Project: https://geo4d.github.io/
⚡️ Code: https://github.com/jzr99/Geo4D

#Geo4D #4DReconstruction #DynamicScenes #OxfordVGG #ComputerVision #MachineLearning #DiffusionModels

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/16/2025, 10:22
t.me/datasciencet/20550
Forget Coding; start Vibing! Tell AI what you want, and watch it build your dream website while you enjoy a cup of coffee.

Date: Thursday, April 17th at 9 PM IST

Register for FREE: https://lu.ma/4nczknky?tk=eAT3Bi

Limited FREE Seat !!!!!!
04/15/2025, 19:33
t.me/datasciencet/20549
🍄 4D Mocap Human-Object 🍄

Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated parts—a rich resource for researchers and developers in the field.

⚡️ Review: https://t.ly/lCof3
⚡️ Paper: https://lnkd.in/dVVBDd_c
⚡️ Project: https://lnkd.in/dwBcseDf

#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/15/2025, 10:15
t.me/datasciencet/20548
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

3 Apr 2025 · Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan ·

The recent breakthroughs in OpenAI's #GPT4o model have demonstrated surprisingly good capabilities in image generation and editing, resulting in significant excitement in the community. This technical report presents the first-look evaluation benchmark (named GPT-ImgEval), quantitatively and qualitatively diagnosing GPT-4o's performance across three critical dimensions: (1) generation quality, (2) editing proficiency, and (3) world knowledge-informed semantic synthesis. Across all three tasks, GPT-4o demonstrates strong performance, significantly surpassing existing methods in both image generation control and output quality, while also showcasing exceptional knowledge reasoning capabilities. Furthermore, based on the GPT-4o's generated data, we propose a classification-model-based approach to investigate the underlying architecture of GPT-4o, where our empirical results suggest the model consists of an auto-regressive (AR) combined with a diffusion-based head for image decoding, rather than the VAR-like architectures. We also provide a complete speculation on GPT-4o's overall architecture. In addition, we conduct a series of analyses to identify and visualize GPT-4o's specific limitations and the synthetic artifacts commonly observed in its image generation. We also present a comparative study of multi-round image editing between GPT-4o and Gemini 2.0 Flash, and discuss the safety implications of GPT-4o's outputs, particularly their detectability by existing image forensic models. We hope that our work can offer valuable insight and provide a reliable benchmark to guide future research, foster reproducibility, and accelerate innovation in the field of image generation and beyond. The codes and datasets used for evaluating GPT-4o can be found at https://github.com/PicoTrex/GPT-ImgEval.

Paper: https://arxiv.org/pdf/2504.02782v1.pdf

Code: https://github.com/picotrex/gpt-imgeval

Dataset: MagicBrush - GenEval

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/15/2025, 09:57
t.me/datasciencet/20547
🎓 2025 Top IT Certification – Free Study Materials Are Here!

🔥Whether you're preparing for #Cisco #AWS #PMP #Python #Excel #Google #Microsoft #AI or any other in-demand certification – SPOTO has got you covered!

📘 Download the FREE IT Certs Exam E-book:
👉 https://bit.ly/4lNVItV
🧠 Test Your IT Skills for FREE:
👉 https://bit.ly/4imEjW5
☁️ Download Free AI Materials :
👉 https://bit.ly/3F3lc5B

📞 Need 1-on-1 IT Exam Help? Contact Now:
👉 https://wa.link/k0vy3x
🌐 Join Our IT Study Group for Daily Updates & Tips:
👉 https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
04/15/2025, 08:35
t.me/datasciencet/20545
Title of paper:
Audio-Visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Authors:
Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu
Description:
This paper introduces ACTalker, an end-to-end video diffusion framework designed for natural talking head generation with both multi-signal and single-signal control capabilities.
The framework employs a parallel Mamba structure with multiple branches, each utilizing a separate driving signal to control specific facial regions.
A gate mechanism is applied across all branches, providing flexible control over video generation.
To ensure natural coordination of the controlled video both temporally and spatially, the Mamba structure enables driving signals to manipulate feature tokens across both dimensions in each branch.
Additionally, a mask-drop strategy is introduced, allowing each driving signal to independently control its corresponding facial region within the Mamba structure, preventing control conflicts.
Experimental results demonstrate that this method produces natural-looking facial videos driven by diverse signals, and that the Mamba layer seamlessly integrates multiple driving modalities without conflict.
Link of abstract paper:
https://arxiv.org/abs/2504.00000
Link of download paper:
https://arxiv.org/pdf/2504.00000.pdf
Code:
https://github.com/harlanhong/actalker
Datasets used in paper:
The paper does not specify the datasets used.
Hugging Face demo:
No Hugging Face demo available.
#ACTalker #TalkingHeadGeneration #VideoDiffusion #MultimodalControl #MambaStructure #DeepLearning #ComputerVision #AI #OpenSource
04/14/2025, 17:38
t.me/datasciencet/20544
📖 100 Essential Data Science Interview Questions

👨🏻‍💻 Preparing for a data science interview?
Reviewing fundamental questions is one of the best strategies for success. During the interview, it's crucial to communicate clearly and simply—especially when explaining complex models and data.
These 100 carefully selected questions will not only help you impress your interviewer but also boost your confidence throughout the interview process.

#DataScienceInterview #TechCareers #InterviewPreparation

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/14/2025, 17:38
t.me/datasciencet/20543
Join for job and internship opportunities 👇👇
https://t.me/jobhuntcamp
04/14/2025, 17:20
t.me/datasciencet/20542
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅ https://t.me/addlist/8_rRW2scgfRhOTc0

✅ https://t.me/Codeprogrammer
04/14/2025, 12:10
t.me/datasciencet/20541
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

31 Mar 2025 · Bang Liu, Xinfeng Li, et.

The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate, multifaceted challenges. This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture that integrates principles from cognitive science, neuroscience, and computational research. We structure our exploration into four interconnected parts. First, we delve into the modular foundation of intelligent agents, systematically mapping their cognitive, perceptual, and operational modules onto analogous human brain functionalities, and elucidating core components such as memory, world modeling, reward processing, and emotion-like systems. Second, we discuss self-enhancement and adaptive evolution mechanisms, exploring how agents autonomously refine their capabilities, adapt to dynamic environments, and achieve continual learning through automated optimization paradigms, including emerging AutoML and LLM-driven optimization strategies. Third, we examine collaborative and evolutionary multi-agent systems, investigating the collective intelligence emerging from agent interactions, cooperation, and societal structures, highlighting parallels to human social dynamics. Finally, we address the critical imperative of building safe, secure, and beneficial AI systems, emphasizing intrinsic and extrinsic security threats, ethical alignment, robustness, and practical mitigation strategies necessary for trustworthy real-world deployment.

Paper: https://arxiv.org/pdf/2504.01990v1.pdf

Code: https://github.com/foundationagents/awesome-foundation-agents

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/14/2025, 09:41
t.me/datasciencet/20540
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

2 Apr 2025 · Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He ·

Although subject-driven generation has been extensively explored in image generation due to its wide applications, it still has challenges in data scalability and subject expansibility. For the first challenge, moving from curating single-subject datasets to multiple-subject ones and scaling them is particularly difficult. For the second, most recent methods center on single-subject generation, making it hard to apply when dealing with multi-subject scenarios. In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge. This pipeline harnesses the intrinsic in-context generation capabilities of diffusion transformers and generates high-consistency multi-subject paired data. Additionally, we introduce UNO, which consists of progressive cross-modal alignment and universal rotary position embedding. It is a multi-image conditioned subject-to-image model iteratively trained from a text-to-image model. Extensive experiments show that our method can achieve high consistency while ensuring controllability in both single-subject and multi-subject driven generation.

Paper: https://github.com/bytedance/uno

Code: https://paperswithcode.com/dataset/dreambench

Dataset: DreamBooth

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/13/2025, 19:38
t.me/datasciencet/20539
Zep: A Temporal Knowledge Graph Architecture for Agent Memory

20 Jan 2025 · Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, Daniel Chalef ·

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark. Additionally, Zep excels in more comprehensive and challenging evaluations than DMR that better reflect real-world enterprise use cases. While existing retrieval-augmented generation (RAG) frameworks for large language model (LLM)-based agents are limited to static document retrieval, enterprise applications demand dynamic knowledge integration from diverse sources including ongoing conversations and business data. Zep addresses this fundamental limitation through its core component Graphiti -- a temporally-aware knowledge graph engine that dynamically synthesizes both unstructured conversational data and structured business data while maintaining historical relationships. In the DMR benchmark, which the MemGPT team established as their primary evaluation metric, Zep demonstrates superior performance (94.8% vs 93.4%). Beyond DMR, Zep's capabilities are further validated through the more challenging LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, Zep achieves substantial results with accuracy improvements of up to 18.5% while simultaneously reducing response latency by 90% compared to baseline implementations. These results are particularly pronounced in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating Zep's effectiveness for deployment in real-world applications.

Paper: https://arxiv.org/pdf/2501.13956v1.pdf

Code: https://github.com/getzep/graphiti

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/13/2025, 09:35
t.me/datasciencet/20538
📚 Become a professional data scientist with these 17 resources!



1️⃣ Python libraries for machine learning

◀️ Introducing the best Python tools and packages for building ML models.

➖➖➖

2️⃣ Deep Learning Interactive Book

◀️ Learn deep learning concepts by combining text, math, code, and images.

➖➖➖

3️⃣ Anthology of Data Science Learning Resources

◀️ The best courses, books, and tools for learning data science.

➖➖➖

4️⃣ Implementing algorithms from scratch

◀️ Coding popular ML algorithms from scratch

➖➖➖

5️⃣ Machine Learning Interview Guide

◀️ Fully prepared for job interviews

➖➖➖

6️⃣ Real-world machine learning projects

◀️ Learning how to build and deploy models.

➖➖➖

7️⃣ Designing machine learning systems

◀️ How to design a scalable and stable ML system.

➖➖➖

8️⃣ Machine Learning Mathematics

◀️ Basic mathematical concepts necessary to understand machine learning.

➖➖➖

9️⃣ Introduction to Statistical Learning

◀️ Learn algorithms with practical examples.

➖➖➖

1️⃣ Machine learning with a probabilistic approach

◀️ Better understanding modeling and uncertainty with a statistical perspective.

➖➖➖

1️⃣ UBC Machine Learning

◀️ Deep understanding of machine learning concepts with conceptual teaching from one of the leading professors in the field of ML,

➖➖➖

1️⃣ Deep Learning with Andrew Ng

◀️ A strong start in the world of neural networks, CNNs and RNNs.

➖➖➖

1️⃣ Linear Algebra with 3Blue1Brown

◀️ Intuitive and visual teaching of linear algebra concepts.

➖➖➖

🔴 Machine Learning Course

◀️ A combination of theory and practical training to strengthen ML skills.

➖➖➖

1️⃣ Mathematical Optimization with Python

◀️ You will learn the basic concepts of optimization with Python code.

➖➖➖

1️⃣ Explainable models in machine learning

◀️ Making complex models understandable.

➖➖➖

⚫️ Data Analysis with Python

◀️ Data analysis skills using Pandas and NumPy libraries.


#DataScience #MachineLearning #DeepLearning #Python #AI #MLProjects #DataAnalysis #ExplainableAI #100DaysOfCode #TechEducation #MLInterviewPrep #NeuralNetworks #MathForML #Statistics #Coding #AIForEveryone #PythonForDataScience


⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/12/2025, 20:37
t.me/datasciencet/20537
Cheatsheet Machine Learning Algorithms

⚡️ BEST DATA SCIENCE CHANNELS ON TELEGRAM 🌟
04/12/2025, 15:06
t.me/datasciencet/20536
ZClip: Adaptive Spike Mitigation for LLM Pre-Training

🖥 Github: https://github.com/bluorion-com/ZClip

📕 Paper: https://arxiv.org/abs/2504.02507v1

🔗 Dataset: https://paperswithcode.com/dataset/hellaswaga>
04/11/2025, 14:34
t.me/datasciencet/20535
🔥 #YOLOv12 is out – new SOTA! ⚡️

👉 YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based versions while harnessing the performance benefits of attention mechanisms.

💙 Source Code & Demo released:
▶️ Review: https://t.ly/jj1oR
▶️ Paper: arXiv
👉 Repo: GitHub
🤗 Demo: https://t.ly/w5rno

#AI #DeepLearning #ComputerVision #YOLO #AttentionMechanism #OpenSource
04/10/2025, 09:29
t.me/datasciencet/20534
🐈 TTT Long Video Generation 🐈

👉 A novel architecture for video generation, adapting the #CogVideoX 5B model by incorporating #TestTimeTraining (TTT) layers.
Adding TTT layers into a pre-trained Transformer enables generating a one-minute clip from text storyboards.
Videos, code & annotations released 💙

🔗 Review: https://t.ly/mhlTN
📄 Paper: arxiv.org/pdf/2504.05298
🌐 Project: test-time-training.github.io/video-dit
💻 Repo: github.com/test-time-training/ttt-video-dit

#AI #VideoGeneration #MachineLearning #DeepLearning #Transformers #TTT #GenerativeAI

⭐️ BEST DATA SCIENCE CHANNELS ON TELEGRAM ⭐️
04/10/2025, 09:11
t.me/datasciencet/20533
#Microsoft launched a #FREE course!!!

 "Web Development for Beginners"

It'll take only 12 weeks to complete. Learn #HTML, #CSS, #JavaScript, #Git, and #GitHub.

The course link:
https://microsoft.github.io/Web-Dev-For-Beginners/

⭐️ BEST DATA SCIENCE CHANNELS ON TELEGRAM ⭐️
04/10/2025, 08:57
t.me/datasciencet/20529
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅ https://t.me/addlist/8_rRW2scgfRhOTc0

✅ https://t.me/Codeprogrammer
04/09/2025, 23:04
t.me/datasciencet/20528
🌳 Compose Anything is Out! 🌳
#SkyworkAI unveils #SkyReelsA2 — a controllable video generation framework that can assemble arbitrary visual elements (e.g., characters, objects, backgrounds) into fully synthesized videos from text prompts.
Code, models, and evaluation benchmark are all released!
🔗 Resources:
Review: https://t.ly/MEjzL
Paper: https://arxiv.org/pdf/2504.02436
Project: https://skyworkai.github.io/skyreels-a2.github.io/
Repo: https://github.com/SkyworkAI/SkyReels-A2
🤗 Models: https://huggingface.co/Skywork/SkyReels-A2

#AI #VideoGeneration #Multimodal #GenerativeAI #SkyReels #OpenSource

https://t.me/DataScienceT ✅
04/08/2025, 17:31
t.me/datasciencet/20527
⛽ VoRA: Vision as LoRA ⛽
#ByteDance introduces #VoRA (Vision as #LoRA) — a novel framework that transforms #LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers.
All training data, source code, and model weights are openly available!

Key Resources:
Overview: https://t.ly/guNVN
Paper: arxiv.org/pdf/2503.20680
GitHub Repo: github.com/Hon-Wong/VoRA
Project Page: georgeluimmortal.github.io/vora-homepage.github.io
04/08/2025, 17:28
t.me/datasciencet/20526
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

28 Mar 2025 · Zhihang Lin, Mingbao Lin, Yuan Xie, Rongrong Ji

This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO). GRPO, while effective, incurs high training costs due to the need for sampling multiple completions for each question. Our experiment and theoretical analysis reveals that the number of completions impacts model accuracy yet increases training time multiplicatively, and not all completions contribute equally to policy training -- their contribution depends on their relative advantage. To address these issues, we propose CPPO, which prunes completions with low absolute advantages, significantly reducing the number needed for gradient calculation and updates. Additionally, we introduce a dynamic completion allocation strategy to maximize GPU utilization by incorporating additional questions, further enhancing training efficiency. Experimental results demonstrate that CPPO achieves up to
speedup on GSM8K and on Math while preserving or even enhancing the accuracy compared to the original GRPO. We release our code at https://github.com/lzhxmu/CPPO.

Paper: https://arxiv.org/pdf/2503.22342v1.pdf

Code: https://github.com/lzhxmu/cppo

Datasets: GSM8K - MATH

https://t.me/DataScienceT ⭐
04/08/2025, 07:59
t.me/datasciencet/20525
4 advanced attention mechanisms you should know:

• Slim attention — 8× less memory, 5× faster generation by storing only K from KV pairs and recomputing V.

• XAttention — 13.5× speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention matrix.

• Kolmogorov-Arnold Attention, KArAt — Adaptable attention with learnable activation functions using KANs instead of softmax.

• Multi-token attention (MTA) — Lets the model consider groups of nearby words together for smarter long-context handling.

Read the overview of them in our free article on https://huggingface.co/blog/Kseniase/attentions

https://t.me/DataScienceM 🌟
04/07/2025, 17:55
t.me/datasciencet/20522
Crystal Generation with Space Group Informed Transformer

🖥 Github: https://github.com/deepmodeling/crystalformer

📕 Paper: https://arxiv.org/abs/2504.02367v1

🔗 Dataset: https://paperswithcode.com/dataset/alex-20
04/07/2025, 17:03
t.me/datasciencet/20521
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

27 Mar 2025 · Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, RongCheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, DaCheng Tao, Philip S. Yu, Ming Zhang

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architectural foundations, collaboration mechanisms, and evolutionary pathways. We unify fragmented research threads by revealing fundamental connections between agent design principles and their emergent behaviors in complex environments. Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time, while also addressing evaluation methodologies, tool applications, practical challenges, and diverse application domains. By surveying the latest developments in this rapidly evolving field, we offer researchers a structured taxonomy for understanding LLM agents and identify promising directions for future research. The collection is available at https://github.com/luo-junyu/Awesome-Agent-Papers.

Paper: https://arxiv.org/pdf/2503.21460v1.pdf

Code: https://github.com/luo-junyu/awesome-agent-papers

https://t.me/DataScienceT ✉️
04/06/2025, 00:32
t.me/datasciencet/20520
The latest and the most up-to-date cyber news will be presented on PPHM HACKER NEWS.
PPHM subscribers are the first people that receive firsthand cybernews and Tech news.

You won't miss any cyber news with us.


https://t.me/pphm_HackerNews
04/05/2025, 20:17
t.me/datasciencet/20518
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

10 Feb 2025 · Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao ·

Recent advancements in diffusion techniques have propelled image and video generation to unprecedented levels of quality, significantly accelerating the deployment and application of generative AI. However, 3D shape generation technology has so far lagged behind, constrained by limitations in 3D data scale, complexity of 3D data processing, and insufficient exploration of advanced techniques in the 3D domain. Current approaches to 3D shape generation face substantial challenges in terms of output quality, generalization capability, and alignment with input conditions. We present TripoSG, a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images. Specifically, we propose: 1) A large-scale rectified flow transformer for 3D shape generation, achieving state-of-the-art fidelity through training on extensive, high-quality data. 2) A hybrid supervised training strategy combining SDF, normal, and eikonal losses for 3D VAE, achieving high-quality 3D reconstruction performance. 3) A data processing pipeline to generate 2 million high-quality 3D samples, highlighting the crucial rules for data quality and quantity in training 3D generative models. Through comprehensive experiments, we have validated the effectiveness of each component in our new framework. The seamless integration of these parts has enabled TripoSG to achieve state-of-the-art performance in 3D shape generation. The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images. Moreover, TripoSG demonstrates improved versatility in generating 3D models from diverse image styles and contents, showcasing strong generalization capabilities. To foster progress and innovation in the field of 3D generation, we will make our model publicly available.

Paper: https://arxiv.org/pdf/2502.06608v3.pdf

Codes:
https://github.com/VAST-AI-Research/TripoSG
https://github.com/tencent/flashvdm

Dataset: 100poisonMpts

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4

https://t.me/DataScienceT
04/05/2025, 10:26
t.me/datasciencet/20516
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.

https://t.me/DataScienceN
04/04/2025, 15:56
t.me/datasciencet/20515
Effect-driven interpretation: Functors for natural language composition

🖥 Github: https://github.com/UCSC-VLAA/MedReason

📕 Paper: https://arxiv.org/abs/2504.00993v1

🔗 Tasks: https://paperswithcode.com/task/knowledge-graphs
04/04/2025, 14:17
t.me/datasciencet/20513
Open Deep Search: Democratizing Search with Open-source Reasoning Agents

26 Mar 2025 · Salaheddin Alzubi, Creston Brooks, Purva Chiniya, Edoardo Contente, Chiara von Gerlach, Lucas Irwin, Yihan Jiang, Arda Kaz, Windsor Nguyen, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath ·

We introduce Open Deep Search (ODS) to close the increasing gap between the proprietary search AI solutions, such as Perplexity's Sonar Reasoning Pro and OpenAI's GPT-4o Search Preview, and their open-source counterparts. The main innovation introduced in ODS is to augment the reasoning capabilities of the latest open-source LLMs with reasoning agents that can judiciously use web search tools to answer queries. Concretely, ODS consists of two components that work with a base LLM chosen by the user: Open Search Tool and Open Reasoning Agent. Open Reasoning Agent interprets the given task and completes it by orchestrating a sequence of actions that includes calling tools, one of which is the Open Search Tool. Open Search Tool is a novel web search tool that outperforms proprietary counterparts. Together with powerful open-source reasoning LLMs, such as DeepSeek-R1, ODS nearly matches and sometimes surpasses the existing state-of-the-art baselines on two benchmarks: SimpleQA and FRAMES. For example, on the FRAMES evaluation benchmark, ODS improves the best existing baseline of the recently released GPT-4o Search Preview by 9.7% in accuracy. ODS is a general framework for seamlessly augmenting any LLMs -- for example, DeepSeek-R1 that achieves 82.4% on SimpleQA and 30.1% on FRAMES -- with search and reasoning capabilities to achieve state-of-the-art performance: 88.3% on SimpleQA and 75.3% on FRAMES.

Paper: https://arxiv.org/pdf/2503.20201v1.pdf

Code: https://github.com/sentient-agi/opendeepsearch

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4

https://t.me/DataScienceT
04/04/2025, 10:05
t.me/datasciencet/20512
🙏💸 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! 🙏💸

Join our channel today for free! Tomorrow it will cost 500$!

https://t.me/+vhF2zNz5GBw3NTU1

You can join at this link! 👆👇

https://t.me/+vhF2zNz5GBw3NTU1
04/03/2025, 15:09
t.me/datasciencet/20511
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅ https://t.me/addlist/8_rRW2scgfRhOTc0

✅ https://t.me/Codeprogrammer
04/02/2025, 23:54
t.me/datasciencet/20510
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

🖥 Github: https://github.com/devoallen/awesome-reasoning-economy-papers

📕 Paper: https://arxiv.org/abs/2503.24377v1
04/02/2025, 22:59
t.me/datasciencet/20509
Long-Context Autoregressive Video Modeling with Next-Frame Prediction

25 Mar 2025 · YuChao Gu, Weijia Mao, Mike Zheng Shou ·

Long-context autoregressive modeling has significantly advanced language generation, but video generation still struggles to fully utilize extended temporal contexts. To investigate long-context video modeling, we introduce Frame AutoRegressive (FAR), a strong baseline for video autoregressive modeling. Just as language models learn causal dependencies between tokens (i.e., Token AR), FAR models temporal causal dependencies between continuous frames, achieving better convergence than Token AR and video diffusion transformers. Building on FAR, we observe that long-context vision modeling faces challenges due to visual redundancy. Existing RoPE lacks effective temporal decay for remote context and fails to extrapolate well to long video sequences. Additionally, training on long videos is computationally expensive, as vision tokens grow much faster than language tokens. To tackle these issues, we propose balancing locality and long-range dependency. We introduce FlexRoPE, an test-time technique that adds flexible temporal decay to RoPE, enabling extrapolation to 16x longer vision contexts. Furthermore, we propose long short-term context modeling, where a high-resolution short-term context window ensures fine-grained temporal consistency, while an unlimited long-term context window encodes long-range information using fewer tokens. With this approach, we can train on long video sequences with a manageable token context length. We demonstrate that FAR achieves state-of-the-art performance in both short- and long-video generation, providing a simple yet effective baseline for video autoregressive modeling.

Paper: https://arxiv.org/pdf/2503.19325v1.pdf

Code: https://github.com/showlab/FAR

Dataset: UCF101

Ranked #2 on Video Generation on UCF-101

https://t.me/DataScienceT ⚠️
04/01/2025, 17:28
t.me/datasciencet/20508
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

13 Mar 2025 · Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, GuanYing Chen, Zilong Dong, Liefeng Bo ·

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation. Recent advances in 3D human reconstruction mainly focus on static human modeling, and the reliance of using synthetic 3D scans for training limits their generalization ability. Conversely, optimization-based video methods achieve higher fidelity but demand controlled capture conditions and computationally intensive refinement processes. Motivated by the emergence of large reconstruction models for efficient static reconstruction, we propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism, enabling detailed preservation of clothing geometry and texture. To further boost the face identity preservation and fine detail recovery, we propose a head feature pyramid encoding scheme to aggregate multi-scale features of the head regions. Extensive experiments demonstrate that our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.

Paper: https://arxiv.org/pdf/2503.10625v1.pdf

Code: https://github.com/aigc3d/LHM

https://t.me/DataScienceT ⚠️
03/31/2025, 17:24
t.me/datasciencet/20507
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

20 Mar 2025 · Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu ·

Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

Paper: https://arxiv.org/pdf/2503.16418v1.pdf

Code: https://github.com/bytedance/infiniteyou

Dataset: 10,000 People - Human Pose Recognition Data

https://t.me/DataScienceT ⚠️
03/30/2025, 17:11
t.me/datasciencet/20506
Greetings.
As part of our research, we want to write a review article in the field of pathology. Friends who are interested in the 2nd and 3rd places on this topic can participate.

✅ Approximate start time: April 10th.

Journal: scientific reports https://www.nature.com/srep/

Price:
2: $400
3: $300

I will help with complete explanations and how to write each section.

@Raminmousa
@Machine_learn
@Paper4money
03/27/2025, 13:38
t.me/datasciencet/20505
🌊 Hydrate Smarter & Get $10 Back! 💧

Looking for the ultimate way to boost your hydration? Our Hydrogen Water Bottle transforms regular water into hydrogen-rich, antioxidant-packed refreshment! ✨

✔️ Elevate your wellness with every sip
✔️ Experience the benefits of hydrogen-rich hydration
✔️ Stay energized & refreshed all day long

💰 Special Offer: Leave a review after your purchase & get $10 back on this order!

Ready to upgrade your hydration game?

Get yours now!


Supported by WaybienAds
03/26/2025, 18:22
t.me/datasciencet/20504
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

🖥 Github: https://github.com/nick7nlp/FastCuRL

📕 Paper: https://arxiv.org/abs/2503.17287v1

🌟 Tasks: https://paperswithcode.com/task/language-modeling
03/25/2025, 00:29
t.me/datasciencet/20502
PiEEG kit - bioscience Lab in home for your Brain and Body

🖥 Github: https://github.com/pieeg-club/PiEEG_Kit

📕 Paper: https://arxiv.org/abs/2503.13482

🌟 Methods: https://paperswithcode.com/task/eeg-1
03/21/2025, 19:56
t.me/datasciencet/20501
⚡️ MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

🖥 Github: https://github.com/hustvl/MaTVLM

📕 Paper: https://arxiv.org/abs/2503.13440v1

🌟 Methods: https://paperswithcode.com/method/speed
03/20/2025, 18:01
t.me/datasciencet/20500
⚡ Special Offer for 3 Days:

Join our exclusive (paid) channel for a lifetime and get your own bot to download over a million books with daily updates, and a special bot to download more than 80 million scientific articles, all for just $30, a one-time payment only.

The paid channel includes a vast encyclopedia of the most famous books and training courses in all fields of data science and programming languages.

We only accept payments via PayPal and cryptocurrencies.

Contact:
t.me/HusseinSheikho ✅
03/19/2025, 16:09
t.me/datasciencet/20499
⚡ Special Offer for 3 Days:

Join our exclusive (paid) channel for a lifetime and get your own bot to download over a million books with daily updates, and a special bot to download more than 80 million scientific articles, all for just $30, a one-time payment only.

The paid channel includes a vast encyclopedia of the most famous books and training courses in all fields of data science and programming languages.

We only accept payments via PayPal and cryptocurrencies.

Contact:
t.me/HusseinSheikho ✅
03/19/2025, 00:14
t.me/datasciencet/20498
Executable Code Actions Elicit Better LLM Agents

1 Feb 2024 · Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating #JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source #LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with #Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

Paper: https://arxiv.org/pdf/2402.01030v4.pdf

Codes:
https://github.com/epfllm/megatron-llm
https://github.com/xingyaoww/code-act

Datasets: MMLU - GSM8K - HumanEval - MATH

https://t.me/DataScienceT ⚠️
03/18/2025, 15:41
t.me/datasciencet/20497
⚡️ TxAgent: An AI agent for therapeutic reasoning across a universe of tools

🖥 Github: https://github.com/mims-harvard/TxAgent

📕 Paper: https://arxiv.org/abs/2503.10970v1

🌟 Methods: https://paperswithcode.com/method/align
03/18/2025, 04:45
t.me/datasciencet/20496
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

10 Mar 2025 · Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang

Enhancing reasoning in Large Multimodal Models (#LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose \textbf{\method}, a two-stage framework adapting rule-based RL for multimodal reasoning through \textbf{Foundational Reasoning Enhancement (FRE)} followed by \textbf{Multimodal Generalization Training (MGT)}. The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2.5-VL-Instruct-3B demonstrate that \method achieves 4.83\% and 4.5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.

Paper: https://arxiv.org/pdf/2503.07536v1.pdf

code: https://github.com/tidedra/lmm-r1

https://t.me/DataScienceT 🧡
03/17/2025, 16:07
t.me/datasciencet/20495
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

7 Mar 2025 · Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu ·

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.

Paper: https://arxiv.org/pdf/2503.05639v2.pdf

Code: https://github.com/TencentARC/VideoPainter

Datasets: VPData - VPBench

https://t.me/DataScienceT 🎙
03/16/2025, 16:04
t.me/datasciencet/20494
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

3 Mar 2025 · Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue ·

Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a single-stream speech codec that decomposes speech into two complementary token types: low-bitrate semantic tokens for linguistic content and fixed-length global tokens for speaker attributes. This disentangled representation, combined with the Qwen2.5 LLM and a chain-of-thought (CoT) generation approach, enables both coarse-grained control (e.g., gender, speaking style) and fine-grained adjustments (e.g., precise pitch values, speaking rate). To facilitate research in controllable TTS, we introduce VoxBox, a meticulously curated 100,000-hour dataset with comprehensive attribute annotations. Extensive experiments demonstrate that Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis. Source code, pre-trained models, and audio samples are available at https://github.com/SparkAudio/Spark-TTS.

Paper: https://arxiv.org/pdf/2503.01710v1.pdf

Code: https://github.com/sparkaudio/spark-tts

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4

https://t.me/DataScienceT
03/15/2025, 15:53
t.me/datasciencet/20493
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

🖥 Github: https://github.com/yunncheng/MMRL

📕 Paper: https://arxiv.org/abs/2503.08497v1

🌟 Dataset: https://paperswithcode.com/dataset/imagenet-s

https://t.me/DataScienceT 💫
03/15/2025, 07:27
t.me/datasciencet/20492
🎁❗️TODAY FREE❗️🎁

Entry to our VIP channel is completely free today. Tomorrow it will cost $500! 🔥

JOIN 👇

https://t.me/+1TWrwFRud4U1YTVi
https://t.me/+1TWrwFRud4U1YTVi
https://t.me/+1TWrwFRud4U1YTVi
03/12/2025, 16:14
t.me/datasciencet/20491
MonSter: Marry Monodepth to Stereo Unleashes Power

15 Jan 2025 · Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang ·

Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery. The refined monodepth is in turn guides stereo effectively at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.1, MonSter ranks 1st across five most commonly used leaderboards -- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements (Bad 1.0 on ETH3D) over the previous best method. Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art across the board. The code is publicly available at: https://github.com/Junda24/MonSter.

Paper: https://arxiv.org/pdf/2501.08643v1.pdf

Code: https://github.com/junda24/monster

Datasets: KITTI - TartanAir

https://t.me/DataScienceT ✉️
03/11/2025, 14:19
t.me/datasciencet/20490
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

🖥 Github: https://github.com/EnVision-Research/Kiss3DGen

📕 Paper: https://arxiv.org/abs/2503.01370v1

🌟 Dataset: https://paperswithcode.com/dataset/nerf
03/10/2025, 20:36
t.me/datasciencet/20489
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

27 Feb 2025 · Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen ·

Autoregressive (AR) modeling, known for its next-token prediction paradigm, underpins state-of-the-art language and visual generative models. Traditionally, a ``token'' is treated as the smallest prediction unit, often a discrete symbol in language or a quantized patch in vision. However, the optimal token definition for 2D image structures remains an open question. Moreover, AR models suffer from exposure bias, where teacher forcing during training leads to error accumulation at inference. In this paper, we propose xAR, a generalized AR framework that extends the notion of a token to an entity X, which can represent an individual patch token, a cell (a
grouping of neighboring patches), a subsample (a non-local grouping of distant patches), a scale (coarse-to-fine resolution), or even a whole image. Additionally, we reformulate discrete token classification as \textbf{continuous entity regression}, leveraging flow-matching methods at each AR step. This approach conditions training on noisy entities instead of ground truth tokens, leading to Noisy Context Learning, which effectively alleviates exposure bias. As a result, xAR offers two key advantages: (1) it enables flexible prediction units that capture different contextual granularity and spatial structures, and (2) it mitigates exposure bias by avoiding reliance on teacher forcing. On ImageNet-256 generation benchmark, our base model, xAR-B (172M), outperforms DiT-XL/SiT-XL (675M) while achieving 20 faster inference. Meanwhile, xAR-H sets a new state-of-the-art with an FID of 1.24, running 2.2 faster than the previous best-performing model without relying on vision foundation modules (\eg, DINOv2) or advanced guidance interval sampling
Paper: https://arxiv.org/pdf/2502.20388v1.pdf

Code: https://github.com/OliverRensu/xAR

https://t.me/DataScienceT ✅
03/10/2025, 14:10
t.me/datasciencet/20488
🌟SPOTO AI Free Resources - Grab Yours Now! 🚀

👉 How to Get It?
✅Click the link below to access the resources.
✅Download and start learning instantly!

📥🔗Download for Free AI Materials: https://bit.ly/3F3lc5B
🔗📝Download Free Python/AI/Microsoft/Excel Study Course:https://bit.ly/3F4smWZ

🥳Don’t miss out on this opportunity to boost your career and stay ahead of the curve. 🏃‍♂️Share this with your friends and let’s grow together! 🌟

🔗📲Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
🌐📚 JOIN IT Study GROUP👇: https://chat.whatsapp.com/E3Vkxa19HPO9ZVkWslBO8s
03/10/2025, 10:13
t.me/datasciencet/20487
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles

26 Feb 2025 · Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li ·

User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (#USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications
.
Paper: https://arxiv.org/pdf/2502.18968v1.pdf

Code: https://github.com/wangkevin02/USP

Dataset: LMSYS-USP

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4

https://t.me/DataScienceT
03/10/2025, 09:59
t.me/datasciencet/20486
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

25 Feb 2025 · Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao ·

Understanding information from visually rich documents remains a significant challenge for traditional Retrieval-Augmented Generation (RAG) methods. Existing benchmarks predominantly focus on image-based question answering (QA), overlooking the fundamental challenges of efficient retrieval, comprehension, and reasoning within dense visual documents. To bridge this gap, we introduce ViDoSeek, a novel dataset designed to evaluate RAG performance on visually rich documents requiring complex reasoning. Based on it, we identify key limitations in current RAG approaches: (i) purely visual retrieval methods struggle to effectively integrate both textual and visual features, and (ii) previous approaches often allocate insufficient reasoning tokens, limiting their effectiveness. To address these challenges, we propose #ViDoRAG, a novel multi-agent RAG framework tailored for complex reasoning across visual documents. ViDoRAG employs a Gaussian Mixture Model (GMM)-based hybrid strategy to effectively handle multi-modal retrieval. To further elicit the model's reasoning capabilities, we introduce an iterative agent workflow incorporating exploration, summarization, and reflection, providing a framework for investigating test-time scaling in RAG domains. Extensive experiments on ViDoSeek validate the effectiveness and generalization of our approach. Notably, ViDoRAG outperforms existing methods by over 10% on the competitive #ViDoSeek benchmark.

Paper: https://arxiv.org/pdf/2502.18017v1.pdf

Code: https://github.com/Alibaba-NLP/ViDoRAG

https://t.me/DataScienceT ✅
03/09/2025, 07:51
t.me/datasciencet/20485
A-MEM: Agentic Memory for LLM Agents

17 Feb 2025 · Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang ·

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/AgenticMemory, while the source code of agentic memory system is available at https://github.com/agiresearch/A-mem.

Paper: https://arxiv.org/pdf/2502.12110v3.pdf

Code: https://github.com/wujiangxu/agenticmemory

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #RAG #Agents #GPT4

https://t.me/DataScienceT
03/08/2025, 08:45
t.me/datasciencet/20484
Escaping The Big Data Paradigm in Self-Supervised Representation Learning

25 Feb 2025 · Carlos Vélez García, Miguel Cazorla, Jorge Pomares ·

The reliance on large-scale datasets and extensive computational resources has become a major barrier to advancing representation learning in vision, especially in data-scarce domains. In this paper, we address the critical question: Can we escape the big data paradigm in self-supervised representation learning from images? We introduce #SCOTT (Sparse Convolutional Tokenizer for Transformers), a shallow tokenization architecture that is compatible with Masked Image Modeling (MIM) tasks. SCOTT injects convolutional inductive biases into Vision Transformers (ViTs), enhancing their efficacy in small-scale data regimes. Alongside, we propose to train on a Joint-Embedding Predictive Architecture within a MIM framework (MIM-JEPA), operating in latent representation space to capture more semantic features. Our approach enables ViTs to be trained from scratch on datasets orders of magnitude smaller than traditionally required --without relying on massive external datasets for pretraining. We validate our method on three small-size, standard-resoultion, fine-grained datasets: Oxford Flowers-102, Oxford IIIT Pets-37, and ImageNet-100. Despite the challenges of limited data and high intra-class similarity, frozen SCOTT models pretrained with MIM-JEPA significantly outperform fully supervised methods and achieve competitive results with SOTA approaches that rely on large-scale pretraining, complex image augmentations and bigger model sizes. By demonstrating that robust off-the-shelf representations can be learned with limited data, compute, and model sizes, our work paves the way for computer applications in resource constrained environments such as medical imaging or robotics. Our findings challenge the prevailing notion that vast amounts of data are indispensable for effective representation learning in vision, offering a new pathway toward more accessible and inclusive advancements in the field.

Paper: https://arxiv.org/pdf/2502.18056v1.pdf

Code: https://github.com/inescopresearch/scott

Datasets: Oxford 102 Flower - Oxford-IIIT Pets - Imagenet100

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4

https://t.me/DataScienceT
03/06/2025, 05:18
t.me/datasciencet/20483
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

26 Feb 2025 · Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang ·

Monocular depth estimation (#MDE) aims to predict scene depth from a single RGB image and plays a crucial role in 3D scene understanding. Recent advances in zero-shot MDE leverage normalized depth representations and distillation-based learning to improve generalization across diverse scenes. However, current depth normalization methods for distillation, relying on global normalization, can amplify noisy pseudo-labels, reducing distillation effectiveness. In this paper, we systematically analyze the impact of different depth normalization strategies on pseudo-label distillation. Based on our findings, we propose Cross-Context Distillation, which integrates global and local depth cues to enhance pseudo-label quality. Additionally, we introduce a multi-teacher distillation framework that leverages complementary strengths of different depth estimation models, leading to more robust and accurate depth predictions. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.
Paper: https://arxiv.org/pdf/2502.19204v1.pdf

Code: https://github.com/Westlake-AGI-Lab/Distill-Any-Depth

Datasets: ScanNet - NYUv2 - ETH3D

Note: Ranked #1 on Depth Estimation on ScanNetV2

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
03/04/2025, 17:01
t.me/datasciencet/20482
🪙 +30.560$ with 300$ in a month of trading! We can teach you how to earn! FREE!

It was a challenge - a marathon 300$ to 30.000$ on trading, together with Lisa!

What is the essence of earning?: "Analyze and open a deal on the exchange, knowing where the currency rate will go. Lisa trades every day and posts signals on her channel for free."

🔹Start: $150
🔹 Goal: $20,000
🔹Period: 1.5 months.

Join and get started, there will be no second chance👇

https://t.me/+FPmafQ5jbDYyODBi
03/04/2025, 15:39
t.me/datasciencet/20481
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

20 Feb 2025 · Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Mike Zheng Shou, Jiaming Liu ·

We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist's unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.

Paper: https://arxiv.org/pdf/2502.14397v1.pdf

Code: https://github.com/showlab/PhotoDoodle

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
03/04/2025, 14:15
t.me/datasciencet/20480
Hawk: Learning to Understand Open-World Video Anomalies

27 May 2024 · Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen ·

Video Anomaly Detection (#VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (#VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that #Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.

Paper: https://arxiv.org/pdf/2405.16886v1.pdf

Code: https://github.com/jqtangust/hawk

Dataset: Hawk Annotation Dataset

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
03/03/2025, 16:42
t.me/datasciencet/20479
Magma: A Foundation Model for Multimodal AI Agents

18 Feb 2025 · Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao ·

We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and complete agentic tasks ranging from UI navigation to robot manipulation. To endow the agentic capabilities, Magma is pretrained on large amounts of heterogeneous datasets spanning from images, videos to robotics data, where the actionable visual objects (e.g., clickable buttons in GUI) in images are labeled by Set-of-Mark (SoM) for action grounding, and the object movements (e.g., the trace of human hands or robotic arms) in videos are labeled by Trace-of-Mark (ToM) for action planning. Extensive experiments show that #SoM and ToM reach great synergy and facilitate the acquisition of spatial-temporal intelligence for our Magma model, which is fundamental to a wide range of tasks as shown in Fig.1. In particular, #Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are specifically tailored to these tasks. On image and video-related multimodal tasks, Magma also compares favorably to popular large multimodal models that are trained on much larger datasets. We make our model and code public for reproducibility at https://microsoft.github.io/Magma.

Paper: https://arxiv.org/pdf/2502.13130v1.pdf

Code: https://github.com/microsoft/Magma

Datasets: Something-Something V2 - EPIC-KITCHENS-100 - Open-X-Embodiment - Ego4D

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
03/03/2025, 08:27
t.me/datasciencet/20478
From System 1 to System 2: A Survey of Reasoning Large Language Models

24 Feb 2025 · Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu ·

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning #LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.

Paper: https://arxiv.org/pdf/2502.17419v1.pdf

Code: https://github.com/zzli2022/awesome-slow-reason-system

Datasets: GSM8K - MedQA - MathVista - GPQA - MMLU-Pro - PGPS9K

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
03/03/2025, 08:02
t.me/datasciencet/20477
Magma: A Foundation Model for Multimodal AI Agents

18 Feb 2025 · Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao ·

We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and complete agentic tasks ranging from UI navigation to robot manipulation. To endow the agentic capabilities, Magma is pretrained on large amounts of heterogeneous datasets spanning from images, videos to robotics data, where the actionable visual objects (e.g., clickable buttons in GUI) in images are labeled by Set-of-Mark (SoM) for action grounding, and the object movements (e.g., the trace of human hands or robotic arms) in videos are labeled by Trace-of-Mark (ToM) for action planning. Extensive experiments show that SoM and ToM reach great synergy and facilitate the acquisition of spatial-temporal intelligence for our Magma model, which is fundamental to a wide range of tasks as shown in Fig.1. In particular, Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are specifically tailored to these tasks. On image and video-related multimodal tasks, Magma also compares favorably to popular large multimodal models that are trained on much larger datasets. We make our model and code public for reproducibility at https://microsoft.github.io/Magma.

Paper: https://arxiv.org/pdf/2502.13130v1.pdf

Code: https://github.com/microsoft/Magma

Datasets: Something-Something V2 - EPIC-KITCHENS-100 - Open-X-Embodiment - Ego4D

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/28/2025, 12:57
t.me/datasciencet/20476
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣  Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅ https://t.me/addlist/8_rRW2scgfRhOTc0

✅ https://t.me/codeprogrammer
02/28/2025, 08:11
t.me/datasciencet/20475
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

🖥 Github: https://github.com/thu-coai/AISafetyLab

📕 Paper: https://arxiv.org/abs/2502.16776v1

🌟 Dataset: https://paperswithcode.com/dataset/gptfuzzer

https://t.me/DataScienceT 🧠
02/27/2025, 21:39
t.me/datasciencet/20474
🎨 Can AI design truly novel concepts like humans? Check SYNTHIA, a breakthrough in T2I generation!

🤖 SYNTHIA composes affordances to create visually novel & functionally coherent designs.

📄 https://arxiv.org/pdf/2502.17793
💻 https://github.com/HyeonjeongHa/SYNTHIA
🎥 https://youtube.com/watch?v=KvsOx44WdzM

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/27/2025, 12:43
t.me/datasciencet/20472
Slamming: Training a Speech Language Model on One GPU in a Day

19 Feb 2025 · Gallil Maimon, Avishai Elmakies, Yossi Adi ·

We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to #SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .

Paper: https://arxiv.org/pdf/2502.15814v1.pdf

Code: https://github.com/slp-rl/slamkit

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/27/2025, 12:42
t.me/datasciencet/20471
Fractal Generative Models

24 Feb 2025 · Tianhong Li, Qinyi Sun, Lijie Fan, Kaiming He ·

Modularization is a cornerstone of computer science, abstracting complex functions into atomic building blocks. In this paper, we introduce a new level of modularization by abstracting generative models into atomic generative modules. Analogous to fractals in mathematics, our method constructs a new type of generative model by recursively invoking atomic generative modules, resulting in self-similar fractal architectures that we call fractal generative models. As a running example, we instantiate our fractal framework using autoregressive models as the atomic generative modules and examine it on the challenging task of pixel-by-pixel image generation, demonstrating strong performance in both likelihood estimation and generation quality. We hope this work could open a new paradigm in generative modeling and provide a fertile ground for future research. Code is available at https://github.com/LTH14/fractalgen.

Paper: https://arxiv.org/pdf/2502.17437v1.pdf

Code: https://github.com/LTH14/fractalgen

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/26/2025, 12:20
t.me/datasciencet/20470
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

14 Feb 2025 · Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi ·

We present #HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (#LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks.

Paper: https://github.com/dcdmllm/healthgpt

Code: https://github.com/dcdmllm/healthgpt

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/25/2025, 18:53
t.me/datasciencet/20469
Zep: A Temporal Knowledge Graph Architecture for Agent Memory

20 Jan 2025 · Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, Daniel Chalef ·

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark. Additionally, Zep excels in more comprehensive and challenging evaluations than DMR that better reflect real-world enterprise use cases. While existing retrieval-augmented generation (#RAG) frameworks for large language model (LLM)-based agents are limited to static document retrieval, enterprise applications demand dynamic knowledge integration from diverse sources including ongoing conversations and business data. Zep addresses this fundamental limitation through its core component Graphiti -- a temporally-aware knowledge graph engine that dynamically synthesizes both unstructured conversational data and structured business data while maintaining historical relationships. In the #DMR benchmark, which the MemGPT team established as their primary evaluation metric, Zep demonstrates superior performance (94.8% vs 93.4%). Beyond DMR, Zep's capabilities are further validated through the more challenging LongMemEval benchmark, which better reflects enterprise use cases through complex temporal reasoning tasks. In this evaluation, #Zep achieves substantial results with accuracy improvements of up to 18.5% while simultaneously reducing response latency by 90% compared to baseline implementations. These results are particularly pronounced in enterprise-critical tasks such as cross-session information synthesis and long-term context maintenance, demonstrating Zep's effectiveness for deployment in real-world applications.

Paper: https://arxiv.org/pdf/2501.13956v1.pdf

Code: https://github.com/getzep/graphiti

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents

https://t.me/DataScienceT
02/24/2025, 10:05
t.me/datasciencet/20468
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover, the lack of transparency in training details creates additional barriers to further innovation. In this study, we present OSUM, an Open Speech Understanding Model designed to explore the potential of training SLUMs under constrained academic resources. The OSUM model combines a Whisper encoder with a Qwen2 LLM and supports a wide range of speech tasks, including speech recognition (ASR), speech recognition with timestamps (SRWT), vocal event detection (VED), speech emotion recognition (SER), speaking style recognition (SSR), speaker gender classification (SGC), speaker age prediction (SAP), and speech-to-text chat (STTC). By employing an ASR+X training strategy, OSUM achieves efficient and stable multi-task training by simultaneously optimizing ASR alongside target tasks. Beyond delivering strong performance, OSUM emphasizes transparency by providing openly available data preparation and training methodologies, offering valuable insights and practical guidance for the academic community. By doing so, we aim to accelerate research and innovation in advanced SULM technologies.

Paper: https://arxiv.org/pdf/2501.13306v2.pdf

Code: https://github.com/aslp-lab/osum

Datasets: LibriSpeech - IEMOCAP

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/23/2025, 09:52
t.me/datasciencet/20467
02/23/2025, 08:45
t.me/datasciencet/20464
02/23/2025, 08:45
t.me/datasciencet/20465
02/23/2025, 08:45
t.me/datasciencet/20466
02/23/2025, 08:45
t.me/datasciencet/20463
✅ 1 Year Perplexity Pro on Your Mail
💰 Price:$20 or ₹1500

✅ 1 Year You.com Pro on Your Mail
💰 Price:$25 or ₹1800

💰Combo Offer:40$

Original Price:200$

How I activate ?
I activate account through voucher codes on your mail for 1 year.

💡 Features Included
✅Advanced AI Models:
• DeepResearch
•GPT-4o, o1, o3 mini(High)
• Deepseek r1[USA Hosted Uncensored]
• Llama 3.1
•Claude 3.5 Sonnet, Claude 3.5 Haiku
•Grok-2(Grok 3 coming too confirmed by its CEO)
•FILE ANALYSIS
•PRO SEARCH

✅Image Generation 🎥
•Flux, DALL-E 3
•Playground v3, Stable Diffusion XL

✔️ What You Get
•1 year of full access.
•A 12-month warranty is included.

💨 This post will be deleted/removed after 24 hours so save my username or contact immediately.

💰 Payment Method: Crypto[LTC or USDT] or UPI
✅ For Inquiry/Purchase DM: @AiChatBoss
02/23/2025, 08:45
t.me/datasciencet/20462
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

13 Feb 2025 · Yiqian Huang, Shiqi Zhang, Xiaokui Xiao ·

Graph-RAG constructs a knowledge graph from text chunks to improve retrieval in Large Language Model (LLM)-based question answering. It is particularly useful in domains such as biomedicine, law, and political science, where retrieval often requires multi-hop reasoning over proprietary documents. Some existing Graph-RAG systems construct #KNN graphs based on text chunk relevance, but this coarse-grained approach fails to capture entity relationships within texts, leading to sub-par retrieval and generation quality. To address this, recent solutions leverage LLMs to extract entities and relationships from text chunks, constructing triplet-based knowledge graphs. However, this approach incurs significant indexing costs, especially for large document collections. To ensure a good result accuracy while reducing the indexing cost, we propose KET-RAG, a multi-granular indexing framework. KET-RAG first identifies a small set of key text chunks and leverages an #LLM to construct a knowledge graph skeleton. It then builds a text-keyword bipartite graph from all text chunks, serving as a lightweight alternative to a full knowledge graph. During retrieval, KET-RAG searches both structures: it follows the local search strategy of existing Graph-RAG systems on the skeleton while mimicking this search on the bipartite graph to improve retrieval quality. We evaluate eight solutions on two real-world datasets, demonstrating that KET-RAG outperforms all competitors in indexing cost, retrieval effectiveness, and generation quality. Notably, it achieves comparable or superior retrieval quality to Microsoft's Graph-RAG while reducing indexing costs by over an order of magnitude. Additionally, it improves the generation quality by up to 32.4% while lowering indexing costs by around 20%.

Paper: https://arxiv.org/pdf/2502.09304v1.pdf

Code: https://github.com/waetr/KET-RAG

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/22/2025, 14:08
t.me/datasciencet/20461
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition

🖥 Github: https://github.com/nuozimiaowu/Text4VPR

📕 Paper: https://arxiv.org/abs/2502.14195v1

🌟 Dataset: https://paperswithcode.com/task/cross-modal-place-recognition

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/22/2025, 13:23
t.me/datasciencet/20460
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded using two bilingual text encoders to handle both English and Chinese. A DiT with 3D full attention is trained using Flow Matching and is employed to denoise input noise into latent frames. A video-based DPO approach, Video-DPO, is applied to reduce artifacts and improve the visual quality of the generated videos. We also detail our training strategies and share key observations and insights. Step-Video-T2V's performance is evaluated on a novel video generation benchmark, Step-Video-T2V-Eval, demonstrating its state-of-the-art text-to-video quality when compared with both open-source and commercial engines. Additionally, we discuss the limitations of current diffusion-based model paradigm and outline future directions for video foundation models. We make both Step-Video-T2V and Step-Video-T2V-Eval available at https://github.com/stepfun-ai/Step-Video-T2V. The online version can be accessed from https://yuewen.cn/videos as well. Our goal is to accelerate the innovation of video foundation models and empower video content creators.

Paper: https://arxiv.org/pdf/2502.10248v1.pdf

Codes:
https://github.com/phixion/phixion
https://github.com/stepfun-ai/step-video-t2v

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/22/2025, 09:39
t.me/datasciencet/20458
The Hundred-Page Language Models Book

Read it:
https://github.com/aburkov/theLMbook

#LLM #NLP #ML #AI #PYTHON #PYTORCH

https://t.me/DataScienceM
02/19/2025, 15:13
t.me/datasciencet/20457
Accelerating Data Processing and Benchmarking of AI Models for Pathology

10 Feb 2025 · Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, Faisal Mahmood

Advances in foundation modeling have reshaped computational pathology. However, the increasing number of available models and lack of standardized benchmarks make it increasingly complex to assess their strengths, limitations, and potential for further development. To address these challenges, we introduce a new suite of software tools for whole-slide image processing, foundation model benchmarking, and curated publicly available tasks. We anticipate that these resources will promote transparency, reproducibility, and continued progress in the field.

Paper: https://arxiv.org/pdf/2502.06750v1.pdf

Codes:
https://github.com/mahmoodlab/trident
https://github.com/mahmoodlab/patho-bench

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/19/2025, 11:01
t.me/datasciencet/20456
Enhance-A-Video: Better Generated Video for Free

11 Feb 2025 · Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You

DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.

Paper: https://arxiv.org/pdf/2502.07508v1.pdf

Code: https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video

#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek

https://t.me/DataScienceT
02/19/2025, 10:17
t.me/datasciencet/20455
Search results are limited to 100 messages.
Some features are available to premium users only.
You need to buy subscription to use them.
Filter
Message type
Similar message chronology:
Newest first
Similar messages not found
Messages
Find similar avatars
Channels 0
High
Title
Subscribers
No results match your search criteria