Data mining or Knowledge Discovery in Databases (KDD) is the process of discovering hidden patterns and knowledge within large amounts of data and making predictions of individual outcomes and behaviours. Data mining techniques are well acknowledged in banking, retail and many other commercial service sectors for their power helping to understand consumer individuality and improve individualised services. How data mining is applied for understanding learners’ needs and personalising learning?
The ideas summarised in this blog mainly come from two papers about educational data mining:
Beyond traditional statistics
Zhao’s & Luan’s (2006) paper provides an overview of data mining focussing on the differences and relationships between data mining and traditional statistics. This paper is mainly about data mining and very little about education. Meantime, it slowly but surely the authors convince that educational research needs both approaches: statistics and data mining. The summary of the main features side by side:
The core argument is that data mining and statistics are two complementary approaches: data mining is for discovering knowledge and suggesting new hypothesis; statistics is for testing hypothesis and theories. Thus, both are needed in educational research.
Other original and not original, but insightful thoughts from the paper:
- “even though data mining may yield better predictions, the model used to obtain the predictions cannot easily be related back to theory.” (p. 8).
- “To an inductive researcher, hypotheses and theories can only be proved wrong; they cannot be confirmed, no matter how much evidence has been gathered to support a hypothesis.” (p. 9)
- “If statistics starts and ends with an all-encompassing theory, data mining provides concrete information about how to go about the action.” (p. 11)
- “Statistical significance does not automatically imply practical significance.” (p. 13)
- “If you torture data enough long it will confess to anything” (p. 8)
Ten years of the educational data mining & the way to go
Romero’s & Ventura’s (2007) review summarises research papers about educational data mining published in journals and conference proceedings over the recent 10 years (1995-2005). Summary:
“This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After pre-processing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area." (p. 135)
The paper is essentially written by data miners and for data miners. Thus, it is more about data types, algorithms, application domains, rather than educational value of the findings. Most interesting and thought provoking are conclusions that outline future research lines:
- Reducing complexity of data mining tools for educators. Data mining tools are designed more for power and flexibility than for simplicity. Most tools are too complex to use for educators and their features go well beyond the scope of what educators may want to do with them.
- Standardization of methods and data. Current tools for mining data from specific courses are used only by their developers. There is need for standard techniques and data formats, so that tools can be applied in various educational contexts.
- Integration with the e-learning system. Results obtained with data mining can be directly applied in e-learning environments to guide learning process. Thus, data mining tool should be integrated into e-learning systems.
- Integration of educational expert knowledge. Effective mining tools integrate educational domain knowledge into data mining techniques. Traditional mining algorithms need to be tuned to take into account the educational knowledge of the context.
The list of references implicitly generates two feelings:
- Educational data mining is a young research domain: about 3/4 of about 80 reviewed papers were published in conference proceedings and only 1/4 in research journals.
- Educational data mining is a realm of computer scientists, and it is little accessible for educators: about 2/3 of papers appeared in pure computer science/artificial intelligence proceedings and journals. The rest 1/3 was published in educational journals and proceedings that mainly focus on ICT application in education (e.g., Computers and Education, ICALT).
Nevertheless, “Our students expect individualized attention in learning. More individualized and refined research is urgently needed to inform institutions’ decision making and to enhance student learning.” (Zhao & Luan, 2006, p. 7). So, what’s the future of educational research: statistics? knowledge discovery? or both? Well, future for sure will bring forth something new.
Comments
One of the challenges faced in leading data mining of large data bases such as IPEDS is that of resistance from practitioners of traditional educational research. Letting the data pose the question is very disturbing to those who allow the question to drive the collection of data. The excitement of quantitative discovery makes the journey worthwhile.
Posted by: Rusty Waller | August 20, 2008 02:18 PM