Pemberian Skor dan Sistem Penilaian
DOI:
https://doi.org/10.56832/edu.v5i3.2408Keywords:
AI Grading, Fairness, Pemberian Skor, Penilaian Otomatis, Reliabilitas, Rubrik, Sistem Penilaian, ValiditasAbstract
Pemberian skor dan sistem penilaian merupakan inti dari asesmen pendidikan dan pelatihan karena menentukan kualitas keputusan akademik: kelulusan, pemeringkatan, umpan balik, hingga perbaikan pembelajaran. Tantangan utama dalam penilaian modern adalah menjaga validitas (apakah skor benar-benar merepresentasikan kompetensi yang dituju), reliabilitas (konsistensi skor), keadilan/fairness (minim bias), serta kegunaan (mendukung pembelajaran). Artikel ini membahas konsep, desain, dan implementasi sistem penilaian dari pendekatan klasik (rubrik analitik/holistik, skala penilaian, pembobotan) hingga pendekatan berbasis pengukuran (generalizability theory, many-facet Rasch) dan tren terkini seperti penilaian otomatis berbasis kecerdasan buatan (AI) untuk esai dan tugas kompleks. Metode yang digunakan adalah studi literatur terarah (2021–2025) dan sintesis konseptual. Hasil pembahasan merumuskan kerangka desain sistem penilaian yang dapat diterapkan di sekolah/perguruan tinggi: (1) penyelarasan capaian pembelajaran–tugas–kriteria, (2) pemilihan model skor dan pembobotan yang transparan, (3) kalibrasi rater dan pengendalian variasi penilai, (4) verifikasi reliabilitas dan bukti validitas, (5) tata kelola data dan audit fairness, serta (6) integrasi umpan balik formatif. Artikel menutup dengan rekomendasi praktis: penggunaan rubrik yang “dapat diaudit”, analisis reliabilitas multi-facet untuk tugas ber-rater, dan kehati-hatian pada AI-grading melalui evaluasi bias, keamanan data, serta validasi berkelanjutan.References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2024). Standards for educational and psychological testing. American Educational Research Association.
Andrade, H. L. (2022). Rubrics as formative assessment tools: Student perspectives and implications for practice. Assessment & Evaluation in Higher Education, 47(4), 543–556.
Andrade, H. L., Brookhart, S. M., & Yu, E. (2023). Classroom grading practices: Theory, evidence, and future directions. Educational Assessment, 28(2), 67–86.
Assingkily, M. S. (2021). Metode Penelitian Pendidikan: Panduan Menulis Artikel Ilmiah dan Tugas Akhir. Yogyakarta: K-Media.
Brookhart, S. M. (2021). How to create and use rubrics for formative assessment and grading. ASCD.
Brookhart, S. M. (2023). Grading and learning: Practices that support student achievement (2nd ed.). ASCD.
Creswell, J. W., & Poth, C. N. (2018). Qualitative inquiry and research design: Choosing among five approaches (4th ed.). SAGE Publications.
de la Torre, J., & Minchen, N. (2021). Cognitive diagnostic assessment in educational measurement. Educational Measurement: Issues and Practice, 40(2), 40–53.
Fulcher, G. (2022). Practical language testing. Routledge.
Jonsson, A., & Panadero, E. (2021). The use and design of scoring rubrics in higher education. Educational Research Review, 34, 100398.
Khamboonruang, A. (2023). Detecting differential rater severity in a high-stakes EFL classroom writing assessment: A many-facets Rasch measurement approach. PASAA: Journal of Language Teaching and Learning in Thailand, 66, 5–36.
Lane, S., Raymond, M. R., & Haladyna, T. M. (2022). Handbook of test development (2nd ed.). Routledge.
Loukina, A., Madnani, N., & Cahill, A. (2024). Fairness considerations in automated essay scoring. In Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications (BEA) (pp. 1–11). Association for Computational Linguistics.
Messick, S. (2021). Validity of educational assessment: Historical foundations and current perspectives. Educational Psychologist, 56(3), 145–158.
Miles, M. B., Huberman, A. M., & Saldaña, J. (2020). Qualitative data analysis: A methods sourcebook (4th ed.). SAGE Publications.
Nitko, A. J., & Brookhart, S. M. (2022). Educational assessment of students (8th ed.). Pearson.
Panadero, E., & Jonsson, A. (2023). The use of scoring rubrics for formative assessment purposes revisited: A review. Educational Research Review, 38, 100479. https://doi.org/10.1016/j.edurev.2022.100479
Panadero, E., Andrade, H., & Brookhart, S. M. (2025). Using rubrics for formative purposes: Factors that influence their effectiveness. Educational Assessment, 30(1), 1–20.
Popham, W. J. (2021). Classroom assessment: What teachers need to know (9th ed.). Pearson.
Rasch, G. (2021). Probabilistic models for some intelligence and attainment tests. University of Chicago Press.



