Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

m-ricย 
posted an update 1 day ago
view post
Post
1865
๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ : ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐Ÿญ๐Ÿฐ๐˜… ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐Ÿš€

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. ๐Ÿšš๐Ÿ›ต Couldn't we optimize this? ๐Ÿค”

๐Ÿ’ก So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
๐Ÿ’ฅ a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here ๐Ÿ‘‰ Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)
  • 1 reply
ยท
davidberenstein1957ย 
posted an update 1 day ago
view post
Post
2370
๐Ÿš€ We will be generating a preference dataset for DPO/ORPO and cleaning it with AI feedback during our upcoming meetup!

In this session, we'll walk you through the essentials of building a distilabel pipeline by exploring two key use cases: cleaning an existing dataset and generating a preference dataset for DPO/ORPO. Youโ€™ll also learn how to make the most of AI feedback, integrating Argilla to gather human feedback and improve the overall data quality.

This session is perfect for you
- if youโ€™re getting started with distilabel or synthetic data
- if you want to learn how to use LLM inference endpoints for **free**
- if you want to discover new functionalities
- if you want to provide us with new feedback

Sign up here: https://lu.ma/dt0c7jru
clemย 
posted an update about 7 hours ago
view post
Post
453
This isnโ€™t a goal of ours because we have plenty of money in the bank but quite excited to see that @huggingfaceis profitable these days, with 220 team members and most of our platform being free (like model hosting) and open-source for the community!

Especially noteworthy at a time when most AI startups wouldnโ€™t survive a year or two without VC money. Yay!
  • 1 reply
ยท
fdaudensย 
posted an update 2 days ago
view post
Post
2594
๐Ÿš€ How The Washington Post Uses AI to Empower Journalists ๐Ÿ”๐Ÿ“ฐ

An exciting new example in the world of AI-assisted journalism! The Post has developed an internal tool called "Hayatacker" that's enhancing in-depth reporting. Here's why it matters:

๐ŸŽฅ What it does:
โ€ข Extracts stills from video files
โ€ข Processes on-screen text
โ€ข Labels objects in images

๐Ÿ—ณ๏ธ First big project:
Analyzed 745 Republican campaign ads on immigration (Jan-Jun 2024)

๐Ÿค Human-AI collaboration:
โ€ข AI extracts and organizes data
โ€ข Reporters verify and analyze findings

๐Ÿ”Ž Thorough approach:
โ€ข Manual review of all 745 ads
โ€ข Reverse image searches when context is lacking
โ€ข Cross-referencing with AdImpact transcripts

๐Ÿ’ก Key insight from WaPo's Senior Editor for AI strategy Phoebe Connelly:
"The more exciting choice is putting AI in the hands of reporters early on in the process."

This tool showcases how AI can augment journalistic capabilities without replacing human insight and verification. It's a powerful example of technology enhancing, not replacing, traditional reporting skills.

๐Ÿ‘‰ Read the full article and the methodology: https://www.washingtonpost.com/elections/interactive/2024/republican-campaign-ads-immigration-border-security/
victorย 
posted an update about 6 hours ago
view post
Post
318
๐Ÿ™‹ Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different โ€“ we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! ๐Ÿ‘‡
ยท
chaoxuย 
posted an update about 20 hours ago
view post
Post
751
๐Ÿš€ We are proud to release our latest suite of three image(s)-to-3D Gradio demos and two new papers.

SpaRP (Unposed sparse views to 3D): sudo-ai/SpaRP SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views (2408.10195)
MeshFormer ( @minghua @NCJ ): sudo-ai/MeshFormer MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model (2408.10198)
MeshLRM-reproduced ( @sarahwei0210 ): sudo-ai/MeshLRM

Great thanks to @angli66 for his many efforts in preparing these demos!
  • 1 reply
ยท
bartowskiย 
posted an update 4 days ago
view post
Post
5242
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
ยท
maxiwย 
posted an update 1 day ago
view post
Post
1346
Just added the newly released xGen-MM v1.5 foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research to my xGen-MM HF Space maxiw/XGen-MM
  • 2 replies
ยท
explorewithaiย 
posted an update 2 days ago
view post
Post
1716
ChatFrame-Persian is first expert Persian language model in Iran.
lamhieuย 
posted an update 3 days ago
view post
Post
2762
๐Ÿš€ Weโ€™re excited to launch Ghost 8B Beta (1608), a top-performing language model with unmatched multilingual support and cost efficiency.

Key Highlights:
- Superior Performance: Outperforms Llama 3.1 8B Instruct, GPT-3.5 Turbo, Claude 3 Opus, GPT-4, and more in winrate scores.
- Expanded Language Support: Now supports 16 languages, including English, Vietnamese, Spanish, Chinese, and more.
- Enhanced Capabilities: Improved math, reasoning, and instruction-following for better task handling.

With two context options (8k and 128k), Ghost 8B Beta is perfect for complex, multilingual applications, balancing power and cost-effectiveness.

๐Ÿ”— Learn More: https://ghost-x.org/docs/models/ghost-8b-beta
ghost-x/ghost-8b-beta-668ead6179f93be717db4542