Model Leaderboard
Performance comparison of various models on CPsyExam benchmark tasks.
CPsyExam-KG (Knowledge)
Model | SCQ (Zero) | MAQ (Zero) | SCQ (Five) | MAQ (Five) | Avg (Zero) | Avg (Five) |
---|---|---|---|---|---|---|
Open-sourced Models | ||||||
ChatGLM2-6B | 49.89% | 9.86% | 53.81% | 14.85% | 39.81% | 44.00% |
ChatGLM3-6B | 53.51% | 5.63% | 55.75% | 5.51% | 41.46% | 43.10% |
YI-6B | 33.26% | 0.26% | 25.39% | 14.01% | 24.95% | 22.31% |
QWEN-14B | 24.99% | 1.54% | 38.17% | 13.19% | 19.08% | 31.88% |
YI-34B | 25.03% | 1.15% | 33.69% | 18.18% | 24.95% | 22.31% |
Psychology-oriented Models | ||||||
MeChat-6B | 50.24% | 4.10% | 51.79% | 11.91% | 38.62% | 41.75% |
MindChat-7B | 49.25% | 6.27% | 56.92% | 5.51% | 38.43% | 43.97% |
MindChat-8B | 26.50% | 0.00% | 26.50% | 0.13% | 19.83% | 19.86% |
Ours-SFT-6B | 52.95% | 10.50% | 58.77% | 2.94% | 42.26% | 44.71% |
API-based Models | ||||||
ERNIE-Bot | 52.48% | 6.66% | 56.10% | 10.37% | 40.94% | 44.58% |
ChatGPT | 57.43% | 11.14% | 61.53% | 24.71% | 45.78% | 52.26% |
ChatGLM | 63.29% | 26.12% | 73.85% | 42.13% | 53.93% | 65.86% |
GPT4 | 76.56% | 10.76% | 78.63% | 43.79% | 59.99% | 69.85% |
CPsyExam-CA (Case Analysis)
Model | SCQ (Zero) | MAQ (Zero) | SCQ (Five) | MAQ (Five) | Avg (Zero) | Avg (Five) |
---|---|---|---|---|---|---|
Open-sourced Models | ||||||
ChatGLM2-6B | 52.50% | 16.00% | 48.50% | 20.00% | 43.38% | 41.38% |
ChatGLM3-6B | 47.00% | 17.00% | 47.33% | 13.50% | 39.50% | 38.88% |
YI-6B | 38.83% | 0.00% | 20.00% | 13.25% | 29.12% | 18.63% |
QWEN-14B | 20.33% | 2.00% | 30.00% | 14.00% | 15.75% | 26.00% |
YI-34B | 20.50% | 0.50% | 22.33% | 8.00% | 15.50% | 19.39% |
Psychology-oriented Models | ||||||
MeChat-6B | 48.67% | 13.50% | 44.83% | 10.50% | 39.86% | 36.25% |
MindChat-7B | 40.83% | 5.00% | 33.83% | 4.50% | 31.88% | 26.50% |
MindChat-8B | 34.17% | 0.00% | 34.17% | 0.00% | 25.63% | 25.63% |
Ours-SFT-6B | 46.50% | 5.50% | 48.67% | 13.00% | 34.00% | 41.00% |
API-based Models | ||||||
ERNIE-Bot | 42.50% | 8.50% | 50.67% | 12.00% | 34.00% | 41.00% |
ChatGPT | 47.33% | 9.00% | 52.67% | 29.50% | 37.75% | 46.88% |
ChatGLM | 69.00% | 20.50% | 65.33% | 42.50% | 56.88% | 59.63% |
GPT4 | 60.33% | 13.00% | 64.17% | 39.50% | 48.50% | 58.00% |