-
Notifications
You must be signed in to change notification settings - Fork 937
[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
b2a5ff0
b5ed7fc
929c502
37b377f
809268e
28c44ae
44cef2f
bfda318
bb4f141
6507e4f
1aef8f8
2fdda3f
6aadd54
2656012
21fa476
fb5a9c8
f1f8604
5a3f283
f04f8ca
2974a3d
18e888f
549c968
f288278
0f1a208
d3caff6
d77e1c3
f1a72a3
6af100f
96c20ac
b5f228d
e81b5c6
0782739
62db0ce
74fa630
d334208
9cfde7d
b8b6aef
90a66f2
3229476
d3486c4
782e690
df74c0a
6dc20a5
5454271
051df69
49ecbac
12099b7
9636095
dda7b65
03cc232
74220b5
cd70bc6
6e223b2
d020514
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,85 @@ | ||||||||||||||||||||||
| --- | ||||||||||||||||||||||
| title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence" | ||||||||||||||||||||||
| thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png | ||||||||||||||||||||||
| authors: | ||||||||||||||||||||||
| - user: luodian | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
pcuenca marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
| - user: PY007 | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: kcz358 | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: pufanyi | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: JvThunder | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: dododododo | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: THUdyh | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: liuhaotian | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: ZhangYuanhan | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: zhangysk | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: Chunyuan24 | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| - user: liuziwei7 | ||||||||||||||||||||||
| guest: true | ||||||||||||||||||||||
| --- | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
| # Unified multimodal large model evaluation, accelerating multimodal intelligence emergence | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| GitHub repo : https://github.com/EvolvingLMMs-Lab/lmms-eval | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Official website : https://lmms-lab.github.io/ | ||||||||||||||||||||||
|
Comment on lines
+41
to
+43
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd maybe move these links to the end of the intro (and, optionally, also to a "Resources" section at the end of the post). At this point, the reader knows nothing about what this is about so they have little incentive to click imo. |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| With the deepening development of artificial intelligence research, multimodal large models such as GPT-4V and LLaVA have become hot topics in both academia and industry. However, these advanced models require an effective evaluation framework to accurately measure their performance, which is not an easy task. On the one hand, the diverse prompts and post-processing methods adopted by different models may lead to significant differences in performance evaluation results, as illustrated by HuggingFace's mention of "1001 flavors of MMLU" in their blog post, indicating that different implementations of the same evaluation dataset may result in significant score differences, even changing the model's ranking on leaderboards. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | ||||||||||||||||||||||
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. | |
| To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Overview of the main features | |
| ## Main features |
Maybe?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again maybe uppercase main and features
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs | |
| # pip install git+https://github.com/huggingface/lmms-eval.git | |
| accelerate launch --multi_gpu --num_processes=8 -m lmms_eval \ | |
| --model llava \ | |
| --model_args pretrained="liuhaotian/llava-v1.5-7b" \ | |
| --tasks mme,mmbench_en \ | |
| --batch_size 1 \ | |
| --log_samples \ | |
| --log_samples_suffix llava_v1.5_mme_mmbenchen \ | |
| --output_path ./logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will change the link to our current repo since hf forked repo is kind of behind and I will also add pip install git+https://github.com/haotian-liu/LLaVA.git
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use 4 for num_processes in the command line invocation, or is it unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question still stands. The previous code snippet showed accelerate running on 8 GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is somehow unrelated because the code snippet is simply for demonstration only since we run baseline test on different datasets.
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about the image link.
kcz358 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these links will be embedded correctly as images (they are references to the github tree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?
I have uploaded all the images here but unable to find a way to let github markdown render the image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should live in the
blogrepo directly to render on hf.co/blog. See here for an example: https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853Note you also need to add the blog details to
_blog.ymlThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's just for the thumbnail, but the blog post itself will show all authors: