-
Notifications
You must be signed in to change notification settings - Fork 941
[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 20 commits
b2a5ff0
b5ed7fc
929c502
37b377f
809268e
28c44ae
44cef2f
bfda318
bb4f141
6507e4f
1aef8f8
2fdda3f
6aadd54
2656012
21fa476
fb5a9c8
f1f8604
5a3f283
f04f8ca
2974a3d
18e888f
549c968
f288278
0f1a208
d3caff6
d77e1c3
f1a72a3
6af100f
96c20ac
b5f228d
e81b5c6
0782739
62db0ce
74fa630
d334208
9cfde7d
b8b6aef
90a66f2
3229476
d3486c4
782e690
df74c0a
6dc20a5
5454271
051df69
49ecbac
12099b7
9636095
dda7b65
03cc232
74220b5
cd70bc6
6e223b2
d020514
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,102 @@ | ||||||
| --- | ||||||
| title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence" | ||||||
| thumbnail: /blog/assets/lmms_eval/thumbnail.png | ||||||
| authors: | ||||||
| - user: luodian | ||||||
| guest: true | ||||||
pcuenca marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| - user: PY007 | ||||||
| guest: true | ||||||
| - user: kcz358 | ||||||
| guest: true | ||||||
| - user: pufanyi | ||||||
| guest: true | ||||||
| - user: JvThunder | ||||||
| guest: true | ||||||
| - user: dododododo | ||||||
| guest: true | ||||||
| - user: THUdyh | ||||||
| guest: true | ||||||
| - user: liuhaotian | ||||||
| guest: true | ||||||
| - user: ZhangYuanhan | ||||||
| guest: true | ||||||
| - user: zhangysk | ||||||
| guest: true | ||||||
| - user: Chunyuan24 | ||||||
| guest: true | ||||||
| - user: liuziwei7 | ||||||
| guest: true | ||||||
| --- | ||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| # Unified multimodal large model evaluation, accelerating multimodal intelligence emergence | ||||||
|
||||||
|
|
||||||
| GitHub repo : https://github.com/EvolvingLMMs-Lab/lmms-eval | ||||||
|
|
||||||
| Official website : https://lmms-lab.github.io/ | ||||||
|
Comment on lines
+41
to
+43
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd maybe move these links to the end of the intro (and, optionally, also to a "Resources" section at the end of the post). At this point, the reader knows nothing about what this is about so they have little incentive to click imo. |
||||||
|
|
||||||
| With the deepening development of artificial intelligence research, multimodal large models such as GPT-4V and LLaVA have become hot topics in both academia and industry. However, these advanced models require an effective evaluation framework to accurately measure their performance, which is not an easy task. On the one hand, the diverse prompts and post-processing methods adopted by different models may lead to significant differences in performance evaluation results, as illustrated by HuggingFace's mention of "1001 flavors of MMLU" in their blog post, indicating that different implementations of the same evaluation dataset may result in significant score differences, even changing the model's ranking on leaderboards. | ||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing. | ||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | ||||||
|
||||||
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | |
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for large multimodal models (LMMs). Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating LMMs. We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to directly give a link to lmms-eval instead of putting it in code formatting
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Overview of the main features | |
| ## Main features |
Maybe?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe uppercase main and features
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use 4 for num_processes in the command line invocation, or is it unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question still stands. The previous code snippet showed accelerate running on 8 GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is somehow unrelated because the code snippet is simply for demonstration only since we run baseline test on different datasets.
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the image link.
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these links will be embedded correctly as images (they are references to the github tree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?
I have uploaded all the images here but unable to find a way to let github markdown render the image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to update date before release :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also I'd move the entry to the end of the file, just in case)