135.Webb, G.I. (2010). Self-Sufficient Itemsets: An Approach to Screening Potentially Interesting Associations Between Items. Transactions on Knowledge Discovery from Data 4. ACM, pages 3:1-3:20.[Pre-Publication PDF][Link to paper via ACM Digital Library] Abstract:Self-sufficient itemsets are those whose frequency cannot explained solely by the frequency of either their subsets or of their supersets. We argue that itemsets that are not self-sufficient will often be of little interest to the data analyst, as their frequency should be expected once that of the itemsets on which their frequency depends is known. We present statistical tests for statistically sound discovery of self-sufficient itemsets, and computational techniques that allow those tests to be applied as a post-processing step for any itemset discovery algorithm. We also present a measure for assessing the degree of potential interest in an itemset that complements these statistical measures.Keywords: Association Rule Discovery, statistically sound discovery, OPUS
134.Song, J., H. Tan, K. Mahmood, R.H.P. Law, A.M. Buckle, G.I. Webb, T. Akutsu, and J.C. Whisstock (2009). Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only. PLoS ONE 4(9). PLOS, pages e7072.[Link to paper] Abstract:Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.Keywords: Bioinformatics
133.Hui, B., Y. Yang, and G.I. Webb (2009). Anytime Classification for a Pool of Instances. Machine Learning 77(1). Netherlands: Springer, pages 61-102.[Pre-publication PDF][Link to paper via Springerlink] Abstract:In many real-world applications of classification learning, such as credit card transaction vetting or classification embedded in sensor nodes, multiple instances simultaneously require classification under computational resource constraints such as limited time or limited battery capacity. In such a situation, available computational resources should be allocated across the instances in order to optimize the overall classification efficacy and efficiency. We propose a novel anytime classification framework, Scheduling Anytime Averaged Probabilistic Estimators (SAAPE), which is capable of classifying a pool of instances, delivering accurate results whenever interrupted and optimizing the collective classification performance. Following the practice of our previous anytime classification system AAPE, SAAPE runs a sequence of very efficient Bayesian probabilistic classifiers to classify each single instance. Furthermore, SAAPE implements seven alternative scheduling schemes to decide which instance gets available computational resources next such that a new classifier can be applied to refine its classification. We formally present each scheduling scheme's definition, rationale and time complexity. We conduct large-scale experiments using 60 benchmark data sets and diversified statistical tests to evaluate SAAPE's performance on zero-one loss classification as well as on probability estimation. We analyze each scheduling scheme's advantage and disadvantage according to both theoretical understandings and empirical observations. Consequently we identify effective scheduling schemes that enable SAAPE to accomplish accurate anytime classification for a pool of instances.Keywords: Conditional Probability Estimation,AODE
132.Yang, Y. and G.I. Webb (2009). Discretization for Naive-Bayes Learning: Managing Discretization Bias and Variance. Machine Learning 74(1). Netherlands: Springer, pages 39-74.[Pre-Publication PDF][Link to paper via Springerlink] Abstract:Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.Keywords: Discretization for Naive Bayes,Conditional Probability Estimation,AODE
131.Novak, P., N. Lavrac, and G.I. Webb (2009). Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10., pages 377-403.[Link to paper on JMLR site ] Abstract:This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods.Keywords: Association Rule Discovery, OPUS
130.Webb, G.I. (2008). Layered Critical Values: A Powerful Direct-Adjustment Approach to Discovering Significant Patterns. Machine Learning 71(2-3). Netherlands: Springer, pages 307-323 [Technical Note].[Pre-Publication PDF][Link to paper via Springerlink] Abstract:Standard pattern discovery techniques, such as association rules, suffer an extreme risk of finding very large numbers of spurious patterns for many knowledge discovery tasks. The direct-adjustment approach to controlling this risk applies a statistical test during the discovery process, using a critical value adjusted to take account of the size of the search space. However, a problem with the direct-adjustment strategy is that it may discard numerous true patterns. This paper investigates the assignment of different critical values to different areas of the search space as an approach to alleviating this problem, using a variant of a technique originally developed for other purposes. This approach is shown to be effective at increasing the number of discoveries while still maintaining strict control over the risk of false discoveries.Keywords: Association Rule Discovery, statistically sound discovery, OPUS
129.Yang, Y., G.I. Webb, K. Korb, and K-M. Ting (2007). Classifying under Computational Resource Constraints: Anytime Classification Using Probabilistic Estimators. Machine Learning 69(1). Netherlands: Springer, pages 35-53.[Pre-publication PDF][Link to paper via Springerlink] Abstract:In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.Keywords: Conditional Probability Estimation,AODE
128.Webb, G.I. (2007). Discovering Significant Patterns. Machine Learning 68(1). Netherlands: Springer, pages 1-33.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Exploratory pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some user-specified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type-1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying well-established statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard exploratory pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to real-world data result in large numbers of patterns that are rejected when subjected to statistical evaluation on holdout data. They also reveal that modification of the pattern discovery process to anticipate subsequent statistical evaluation can increase the number of patterns that are accepted by statistical evaluation on holdout data.Keywords: Association Rule Discovery, statistically sound discovery, OPUS
127.Faux, N.G., G.A. Huttley, K. Mahmood, G.I. Webb, M. Garcia de la Banda, and J.C. Whisstock (2007). RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins. Genome Research 17(1). Woodbury, New York: Cold Spring Harbor Laboratory Press, ISSN 1088-9051/07, pages 1118-1127.[Link to paper via Genome Research On-line] Abstract:Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organisms transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.Keywords: Bioinformatics
126.Yang, Y., G.I. Webb, J. Cerquides, K. Korb, J. Boughton, and K-M. Ting (2007). To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators. IEEE Transactions on Knowledge and Data Engineering (TKDE) 19(12). Los Alamitos, CA: IEEE Computer Society, pages 1652-1665.[Pre-publication PDF][ Link to paper via IEEE] Abstract:We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This papers main contributions are three-fold. First, it formally presents each schemes definition, rationale and time complexity; and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each schemes classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms with immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.Keywords: Conditional Probability Estimation,AODE
125.Zheng, F. and G.I. Webb (2007). Finding the Right Family: Parent and Child Selection for Averaged One-Dependence Estimators. In Lecture Notes in Artificial Intelligence 4710: Proceedings of the 18th European Conference on Machine Learning (ECML'07) Warsaw, Poland. Berlin/Heidelberg: Springer-Verlag, pages 490-501.[Pre-publication PDF] Abstract:Averaged One-Dependence Estimators (AODE) classifies by uniformly aggregating all qualified one-dependence estimators (ODEs). Its capacity to significantly improve naive Bayes' accuracy without undue time complexity has attracted substantial interest. Forward Sequential Selection and Backwards Sequential Elimination are effective wrapper techniques to identify and repair harmful interdependencies which have been profitably applied to naive Bayes. However, their straightforward application to AODE has previously proved ineffective. We investigate novel variants of these strategies. Our extensive experiments show that elimination of child attributes from within the constituent ODEs results in a significant improvement in probability estimate and reductions in bias and error relative to unmodified AODE. In contrast, elimination of complete constituent ODEs and the four types of attribute addition are found to be less effective and do not demonstrate any strong advantage over AODE. These surprising results lead to effective techniques for improving AODE's prediction accuracy.Keywords: Conditional Probability Estimation,AODE
124.Zheng, F. and G.I. Webb (2006). Efficient Lazy Elimination for Averaged One-Dependence Estimators. In W. Cohen and A. Moore (Eds.), ACM International Conference Proceeding Series, Vol. 148: The Proceedings of the Twenty-third International Conference on Machine Learning (ICML'06) Pittsburgh, Pennsylvania. New York, NY: ACM Press, pages 1113 - 1120.[Pre-publication PDF][Link to paper via ACM Portal] Abstract:Semi-naive Bayesian classifiers seek to retain the numerous strengths of naive Bayes while reducing error by weakening the attribute independence assumption. Backwards Sequential Elimination (BSE) is a wrapper technique for attribute elimination that has proved effective at this task. We explore a new efficient technique, Lazy Elimination (LE), which eliminates highly related attribute-values at classification time without the computational overheads inherent in wrapper techniques. We analyze the effect of LE and BSE on Averaged One-Dependence Estimators (AODE), a state-of-the-art semi-naive Bayesian algorithm. Our extensive experiments show that LE significantly reduces bias and error without undue additional computation, while BSE significantly reduces bias but not error, with high training time complexity. In the context of AODE, LE has a significant advantage over BSE in both computational efficiency and error.Keywords: Conditional Probability Estimation,AODE
123.Webb, G.I. (2006). Discovering Significant Rules. In L. Ungar, M. Craven, D. Gunopulos and T. Eliassi-Rad (Eds.), Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2006) Philadelphia, PA. New York: The Association for Computing Machinery, pages 434 - 443.[Pre-publication PDF][Download from ACM Portal] Abstract:In many applications, association rules will only be interesting if they represent non-trivial correlations between all constituent items. Numerous techniques have been developed that seek to avoid false discoveries. However, while all provide useful solutions to aspects of this problem, none provides a generic solution that is both flexible enough to accommodate varying definitions of true and false discoveries and powerful enough to provide strict control over the risk of false discoveries. This paper presents generic techniques that allow definitions of true and false discoveries to be specified in terms of arbitrary statistical hypothesis tests and which provide strict control over the experimentwise risk of false discoveries.Keywords: OPUS, Association rules, Rule discovery, statistically sound discovery
122.Yang, Y., G.I. Webb, J. Cerquides, K. Korb, J. Boughton, and K-M. Ting (2006). To Select or To Weigh: A Comparative Study of Model Selection and Model Weighing for SPODE Ensembles. In J. Furkranz, T. Scheffer and M. Spiliopoulou (Eds.), Lecture Notes in Computer Science 4212: Proceedings of the 17th European Conference on Machine Learning (ECML'06) Berlin, Germany. Berlin/Heidelberg: Springer-Verlag, pages 533-544.[Pre-publication PDF][Link to paper via Springerlink] Abstract:An ensemble of Super-Parent-One-Dependence Estimators (SPODEs) offers a powerful yet simple alternative to naive Bayes classifiers, achieving significantly higher classification accuracy at a moderate cost in classification efficiency. Currently there exist two families of methodologies that ensemble candidate SPODEs for classification. One is to select only helpful SPODEs and uniformly average their probability estimates, a type of model selection. Another is to assign a weight to each SPODE and linearly combine their probability estimates, a methodology named model weighing. This paper presents a theoretical and empirical study comparing model selection and model weighing for ensembling SPODEs. The focus is on maximizing the ensemble's classification accuracy while minimizing its computational time. A number of representative selection and weighing schemes are studied, providing a comprehensive research on this topic and identifying effective schemes that provide alternative trades-offs between speed and expected errorKeywords: Conditional Probability Estimation,AODE
121.Lu, J., Y. Yang, and G.I. Webb (2006). Incremental Discretization for Naive-Bayes Classifier. In Xue Li, Osmar R. Zaane and Zhanhuai Li (Eds.), Lecture Notes in Computer Science 4093: Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA 2006) Xian, China. Berlin: Springer, pages 223-238.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Nave-Bayes classifiers (NB) support incremental learning. However, the lack of effective incremental discretization methods has been hindering NB's incremental learning in face of quantitative data. This problem is further compounded by the fact that quantitative data are everywhere, from temperature readings to share prices. In this paper, we present a novel incremental discretization method for NB, incremental flexible frequency discretization (IFFD). IFFD discretizes values of a quantitative attribute into a sequence of intervals of flexible sizes. It allows online insertion and splitting operation on intervals. Theoretical analysis and experimental test are conducted to compare IFFD with alternative methods. Empirical evidence suggests that IFFD is efficient and effective. NB coupled with IFFD achieves a rapport between high learning efficiency and high classification accuracy in the context of incremental learning.Keywords:
120.Webb, G.I. (2006). Anytime Learning and Classification for Online Applications. In Y. Li, M. Looi and N. Zhong (Eds.), Advances in Intelligent IT: Proceedings of the Fourth International Conference on Active Media Technology (AMT'06). [Extended Abstract] Brisbane, Australia. Amsterdam: IOS Press, pages 7-12.[Pre-publication PDF] Abstract:Many online applications of machine learning require fast classification and hence utilise efficient classifiers such as naive Bayes. However, outside periods of peak computation load, additional computational resources will often be available. Anytime classification can use whatever computational resources may be available at classification time to improve the accuracy of the classifications made.Keywords:
119.Webb, G.I. and D. Brain (2006). Generality is Predictive of Prediction Accuracy. In LNAI State-of-the-Art Survey series, 'Data Mining: Theory, Methodology, Techniques, and Applications'. Berlin/Heidelberg: Springer, pages 1-13, (Note: an earlier version of this paper was published in the Proceedings of PKAW 2002, pp 117-130).[Pre-publication PDF][Link to paper via Springerlink] Abstract:During knowledge acquisition it frequently occurs that multiple alternative potential rules all appear equally credible. This paper addresses the dearth of formal analysis about how to select between such alternatives. It presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. We argue that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. We also argue that in comparison to the more general rule, the accuracy of the more specific rule on unseen cases will tend to be closer to the accuracy obtained on training data. Experimental evidence is provided in support of these hypotheses. These hypotheses can be useful for selecting between rules in order to achieve specific knowledge acquisition objectives.Keywords: Generality
118.Huang, S. and G.I. Webb (2006). Efficiently Identifying Exploratory Rules' Significance. In LNAI State-of-the-Art Survey series, 'Data Mining: Theory, Methodology, Techniques, and Applications'. Berlin/Heidelberg: Springer, pages 64-77, (Note: an earlier version of this paper was published in S.J. Simoff and G.J. Williams (Eds.), Proceedings of the Third Australasian Data Mining Conference (AusDM04) Cairns, Australia. Sydney: University of Technology, pages 169-182.).[Link to paper via Springerlink] Abstract:How to efficiently discard potentially uninteresting rules in exploratory rule discovery is one of the important research foci in data mining. Many researchers have presented algorithms to automatically remove potentially uninteresting rules utilizing background knowledge and user-specified constraints. Identifying the significance of exploratory rules using a significance test is desirable for removing rules that may appear interesting by chance, hence providing the users with a more compact set of resulting rules. However, applying statistical tests to identify significant rules requires considerable computation and data access in order to obtain the necessary statistics. The situation gets worse as the size of the database increases. In this paper, we propose two approaches for improving the efficiency of significant exploratory rule discovery. We also evaluate the experimental effect in impact rule discovery which is suitable for discovering exploratory rules in very large, dense databases.Keywords: Association Rule Discovery, statistically sound discovery, OPUS
117.Webb, G. I., J. Boughton, and Z. Wang (2005). Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning 58(1). Netherlands: Springer, pages 5-24.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and TAN with substantially improved computational efficiency.Keywords: Conditional Probability Estimation,AODE
116.Webb, G. I. and K.M. Ting (2005). On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions. Machine Learning 58(1). Netherlands: Springer, pages 25-32.[Prepublication PDF][Link to paper via Springerlink] Abstract:We counsel caution in the application of ROC analysis for prediction of classifier accuracy under varying class distributions. The heart of our contention is that in real-world applications variations of class distribution are likely to result from forces that affect the distribution of the attribute-values, rather than forces that directly affect the class distribution. In statistical terms, it is usually the class, rather than the attributes, that is the dependent variable. If the class distribution alters as an indirect consequence of changes in the distribution of the attribute values, rather than vice versa, performance estimates derived through ROC analysis may be grossly inaccurate.
115.Webb, G. I. and S. Zhang (2005). k-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10(1). Netherlands: Springer, pages 39-79.[Prepublication PDF][Link to paper via Springerlink] Abstract:K-most-interesting rule discovery finds the k rules that optimize a user-specified measure of interestingness with respect to a set of sample data and user-specified constraints. This approach avoids many limitations of the frequent itemset approach of association rule discovery. This paper presents a scalable algorithm applicable to a wide range of k-most-interesting rule discovery tasks and demonstrates its efficiency.Keywords: Association Rule Discovery, statistically sound discovery, OPUS
114.Siu, K.K.W., S.M. Butler, T. Beveridge, J.E. Gillam, C.J. Hall, A.H. Kaye, R.A. Lewis, K. Mannan, G. McLoughlin, S. Pearson, A.R. Round, E. Schultke, G.I. Webb, and S.J. Wilkinson (2005). Identifying markers of pathology in SAXS data of malignant tissues of the brain. Nuclear Instruments and Methods in Physics Research A 548. Elsevier, pages 140-146.[Pre-publication PDF][Link to paper via Science Direct] Abstract:Conventional neuropathological analysis for brain malignancies is heavily reliant on the observation of morphological abnormalities, observed in thin, stained sections of tissue. Small Angle X-ray Scattering (SAXS) data provide an alternative means of distinguishing pathology by examining the ultra-structural (nanometer length scales) characteristics of tissue. To evaluate the diagnostic potential of SAXS for brain tumors, data was collected from normal, malignant and benign tissues of the human brain at station 2.1 of the Daresbury Laboratory Synchrotron Radiation Source and subjected to data mining and multivariate statistical analysis. The results suggest SAXS data may be an effective classi.er of malignancy.Keywords: Bioinformatics
113.Huang, S. and G.I. Webb (2005). Discarding Insignificant Rules During Impact Rule Discovery in Large, Dense Databases. In H. Kargupta, C. Kamath, J. Srivastava and A. Goodman (Eds.), Proceedings of the Fifth SIAM International Conference on Data Mining (SDM'05) [short paper] Newport Beach, CA. Philadelphia, PA: Society for Industrial and Applied Mathematics, pages 541-545.[Pre-publication PDF][Link to SIAM] Abstract:Considerable progress has been made on how to reduce the number of spurious exploratory rules with quantitative attributes. However, little has been done for rules with undiscretized quantitative attributes. It is argued that propositional rules can not effectively describe the interactions between quantitative and qualitative attributes. Aumann and Lindell proposed quantitative association rules to provide a better description of such relationship, together with a rule pruning techniques . Since their technique is based on the frequent itemset framework, it is not suitable for rule discovery in large, dense databases. In this paper, an efficient technique for automatically discarding insignificant rules during rule discovery is proposed, based on the OPUS search algorithm. Experiments demonstrate that the algorithm we propose can efficiently remove potentially uninteresting rules even in very large, dense databases.Keywords: impact rules, opus
112.Huang, S. and G.I. Webb (2005). Pruning Derivative Partial Rules During Impact Rule Discovery. In T.B. Ho, D. Cheung and H. Liu (Eds.), Lecture Notes in Computer Science Vol. 3518: Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2005) Hanoi, Vietnam. Berlin/Heidelberg: Springer, pages 71-80.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Because exploratory rule discovery works with data that is only a sample of the phenomena to be investigated, some resulting rules may appear interesting only by chance. Techniques are developed for automatically discarding statistically insignificant exploratory rules that cannot survive a hypothesis with regard to its ancestors. We call such insignificant rules derivative extended rules. In this paper, we argue that there is another type of derivative exploratory rules, which is derivative with regard to their children. We also argue that considerable amount of such derivative partial rules can not be successfully removed using existing rule pruning techniques. We propose a new technique to address this problem. Experiments are done in impact rule discovery to evaluate the effect of this derivative partial rule filter. Results show that the inherent problem of too many resulting rules in exploratory rule discovery is alleviated.Keywords: Impact Rules
111.Yang, Y., K. Korb, K-M. Ting, and G.I. Webb (2005). Ensemble Selection for SuperParent-One-Dependence Estimators. In S. Zhang and R. Jarvis (Eds.), Lecture Notes in Computer Science 3809: Advances in Artificial Intelligence, Proceedings of the 18th Australian Joint Conference on Artificial Intelligence (AI 2005) Sydney, Australia. Berlin/Heidelberg: Springer, pages 102-111.[Pre-publication PDF][Link to paper via Springerlink] Abstract:SuperParent-One-Dependence Estimators (SPODEs) loosen Naive-Bayes' attribute independence assumption by allowing each attribute to depend on a common single attribute (superparent) in addition to the class. An ensemble of SPODEs is able to achieve high classification accuracy with modest computational cost. This paper investigates how to select SPODEs for ensembling. Various popular model selection strategies are presented. Their learning efficacy and efficiency are theoretically analyzed and empirically verified. Accordingly, guidelines are investigated for choosing between selection criteria in differing contexts.Keywords: Conditional Probablity Estimation, AODE
110.Zheng, F. and G.I. Webb (2005). A Comparative Study of Semi-naive Bayes Methods in Classification Learning. In S.J. Simoff, G.J. Williams, J. Galloway and I. Kolyshkina (Eds.), Proceedings of the Fourth Australasian Data Mining Conference (AusDM05) Sydney, Australia. Sydney: University of Technology, pages 141-156.[Pre-publication PDF] Abstract:Numerous techniques have sought to improve the accuracy of Naive Bayes (NB) by alleviating the attribute interdependence problem. This paper summarizes these semi-naive Bayesian methods into two groups: those that apply conventional NB with a new attribute set, and those that alter NB by allowing inter-dependencies between attributes. We review eight typical semi-naive Bayesian learning algorithms and perform error analysis using the bias-variance decomposition on thirty-six natural domains from the UCI Machine Learning Repository. In analysing the results of these experiments we provide general recommendations for selection between methods.Keywords: AODE, Conditional Probability Estimation
109.Webb, G.I. and Z. Zheng (2004). Multistrategy Ensemble Learning: Reducing Error by Combining Ensemble Learning Techniques. IEEE Transactions on Knowledge and Data Engineering 16(8). Los Alamitos, CA: IEEE Computer Society, pages 980-991.[PDF][ Link to paper via IEEE] Abstract:Ensemble learning strategies, especially Boosting and Bagging decision trees, have demonstrated impressive capacities to improve the prediction accuracy of base learning algorithms. Further gains have been demonstrated by strategies that combine simple ensemble formation approaches. In this paper, we investigate the hypothesis that the improvement inaccuracy of multi-strategy approaches to ensemble learning is due to an increase in the diversity of ensemble members that are formed. In addition, guided by this hypothesis, we develop three new multi-strategy ensemble-learning techniques. Experimental results in a wide variety of natural domains suggest that these multi-strategy ensemble-learning techniques are, on average, more accurate than their component ensemble learning techniquesKeywords: Multiboosting, Boosting
108.Thiruvady, D. R. and G. I. Webb (2004). Mining Negative Rules using GRD. In H. Dai, R. Srikant and C. Zhang (Eds.), Lecture Notes in Computer Science Vol. 3056: Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 04) [Short Paper] Sydney, Australia. Berlin/Heidelberg: Springer, pages 161-165.[Pre-publication PDF][Link to paper via Springerlink] Abstract:GRD is an algorithm for k-most interesting rule discovery. In contrast to association rule discovery, GRD does not require the use of a minimum support constraint. Rather, the user must specify a measure of interestingness and the number of rules sought (k). This paper reports efficient techniques to extend GRD to support mining of negative rules. We demonstrate that the new approach provides tractable discovery of both negative and positive rules.Keywords: opus
107.Wang, Z., G.I. Webb, and F. Zheng (2004). Selective Augmented Bayesian Network Classifiers Based on Rough Set Theory. In H. Dai, R. Srikant and C. Zhang (Eds.), Lecture Notes in Computer Science Vol. 3056: Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 04) Sydney, Australia. Berlin/Heidelberg: Springer, pages 319-328.[Pre-publication PDF][Link to paper via Springerlink] Abstract:The naive Bayes classifier is widely used in interactive applications due to its computational efficiency, direct theoretical base, and competitive accuracy. However, its attribute independence assumption can result in sub-optimal accuracy. A number of techniques have explored simple relaxations of the attribute independence assumption in order to increase accuracy. TAN, Tree-Augmented Naive Bayes, is a state-of-the-art extension of naive Bayes, that can express limited forms of inter-dependence among attributes. Rough sets theory provides tools for expressing inexact or partial dependencies within dataset. In this paper, we present a variant of TAN and compare their tree classifier structures, which can be thought of as a selective restricted trees Bayesian classifier. It delivers lower error than both pre-existing state-of-the-art TAN-based classifiers, with substantially less computation than is required by the SuperParent approach.
106.Newlands, D.A. and G.I. Webb (2004). Convex Hulls as an Hypothesis Language Bias. In N.F.F.E. Ebecken, C.A. Brebbia and A. Zanasi (Eds.), Proceedings of the Fourth International Conference on Data Mining (DATA MINING IV) Rio de Janeiro, Brazil. Southampton, UK: WIT Press, pages 285-294.[Pre-publication PDF][Link to WIT Press] Abstract:Classification learning is dominated by systems which induce large numbers of small axis-orthogonal decision surfaces which biases such systems towards particular hypothesis types. However, there is reason to believe that many domains have underlying concepts which do not involve axis orthogonal surfaces. Further, the multiplicity of small decision regions mitigates against any holistic appreciation of the theories produced by these systems, notwithstanding the fact that many of the small regions are individually comprehensible. We propose the use of less strongly biased hypothesis languages which might be expected to model concepts using a number of structures close to the number of actual structures in the domain. An instantiation of such a language, a convex hull based classifier, CH1, has been implemented to investigate modeling concepts as a small number of large geometric structures in n-dimensional space. A comparison of the number of regions induced is made against other well-known systems on a representative selection of largely or wholly continuous valued machine learning tasks. The convex hull system is shown to produce a number of induced regions about an order of magnitude less than well-known systems and very close to the number of actual concepts. This representation, as convex hulls, allows the possibility of extraction of higher level mathematical descriptions of the induced concepts, using the techniques of computational geometry.
105.Newlands, D.A. and G.I. Webb (2004). Alternative Strategies for Decision List Construction. In N.F.F.E. Ebecken, C.A. Brebbia and A. Zanasi (Eds.), Proceedings of the Fourth International Conference on Data Mining (DATA MINING IV) Rio de Janeiro, Brazil. Southampton, UK: WIT Press, pages 265-273.[Pre-publication PDF][Link to WIT Press] Abstract:This work surveys well-known approaches to building decision lists. Some novel variations to strategies based on default rules for the most common class and insertion of new rules before the default rule are presented. These are expected to offer speed up in the construction of the decision list as well as compression of the length of the list. These strategies and a testing regime have been implemented and some empirical studies done to compare the strategies. Experimental results are presented and interpreted. We show that all strategies deliver decision lists of comparable accuracy. However, two techniques are shown to deliver this accuracy with lists composed of significantly fewer rules than alternative strategies. Of these, one also demonstrates significant computational advantages. The prepending strategy is also demonstrated to produce decision lists which are as much as an order of magnitude shorter than those produced by CN2.Keywords: prepend
104.Huang, S. and G.I. Webb (2004). Efficiently Identifying Exploratory Rules' Significance. In S.J. Simoff and G.J. Williams (Eds.), Proceedings of the Third Australasian Data Mining Conference (AusDM04) Cairns, Australia. Sydney: University of Technology, pages 169-182.[Pre-publication PDF] Abstract:How to efficiently discard potentially uninteresting rules in exploratory rule discovery is one of the important research foci in data mining. Many researchers have presented algorithms to automatically remove potentially uninteresting rules utilizing background knowledge and user-specified constraints. Identifying the significance of exploratory rules using a significance test is desirable for removing rules that may appear interesting by chance, hence providing the users with a more compact set of resulting rules. However, applying statistical tests to identify significant rules requires considerable computation and data access in order to obtain the necessary statistics. The situation gets worse as the size of the database increases. In this paper, we propose two approaches for improving the efficiency of significant exploratory rule discovery. We also evaluate the experimental effect in impact rule discovery which is suitable for discovering exploratory rules in very large, dense databases.Keywords: Association Rule Discovery,statistically sound discovery, OPUS
103.Zhang, C., S. Zhang, and G. I. Webb (2003). Identifying Approximate Item-Sets Of Interest In Large Databases. Applied Intelligence 18. Netherlands: Springer, pages 91-104.[Link to paper via Springerlink] Abstract:This paper presents a method for discovering approximate frequent itemsets of interest in large scale databases. This method uses the central limit theorem to increase efficiency, enabling us to reduce the sample size by about half compared to previous approximations. Further efficiency is gained by pruning from the search space uninteresting frequent itemsets. In addition to improving efficiency, this measure also reduces the number of itemsets that the user need consider. The model and algorithm have been implemented and evaluated using both synthetic and real-world databases. Our experimental results demonstrate the efficiency of the approachKeywords: Learning from large datasets
102.Webb, G. I., S. Butler, and D. Newlands (2003). On Detecting Differences Between Groups. In P. Domingos, C. Faloutsos, T. Senator, H. Kargupta and L. Getoor (Eds.), Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003) Washington, DC. New York: The Association for Computing Machinery, pages 256-265.[PDF][Paper via ACM Portal] Abstract:Understanding the differences between contrasting groups is a fundamental task in data analysis. This realization has led to the development of a new special purpose data mining technique, {\em contrast-set mining}. We undertook a study with a retail collaborator to compare contrast-set mining with existing rule-discovery techniques. To our surprise we observed that straightforward application of an existing commercial rule-discovery system, Magnum Opus, could successfully perform the contrast-set-mining task. This led to the realization that contrast-set mining is a special case of the more general rule-discovery task. We present the results of our study together with a proof of this conclusionKeywords: OPUS
101.Yang, Y. and G.I. Webb (2003). Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers. In K-Y. Whang, J. Jeon, K. Shim and J. Srivastava (Eds.), Lecture Notes in Artificial Intelligence Vol. 2637: Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'03) Seoul, Korea. Berlin/Heidelberg: Springer-Verlag, pages 501-512.[Pre-publication PDF][Link to paper via Springerlink] Abstract:The use of different discretization techniques can be expected to affect the bias and variance of a learning algorithm. We call such an effect discretization bias and variance. Proportional k-interval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analysis suggests that this is desirable for naive-Bayes classifiers. However PKID has sub-optimal performance when learning from small training data. We argue that this is because PKID equally weighs bias reduction and variance reduction. But for small data, variance reduction can contribute more to lower learning error and thus should be given greater weight than bias reduction. Accordingly we propose weighted proportional k-interval discretization (WPKID), which establishes a more suitable bias and variance trade-off for small data while allowing additional training data to be used to reduce both bias and variance. Our experiments demonstrate that for naive-Bayes classifiers, WPKID improves upon PKID for smaller datasets with significant frequency; and WPKID delivers lower classification error significantly more often than not in comparison to the other three leading alternative discretization techniques studied.Keywords: Discretization for Naive Bayes
100.Shi, H., Z. Wang, G.I. Webb, and H. Huang (2003). A New Restricted Bayesian Network Classifier. In K-Y. Whang, J. Jeon, K. Shim and J. Srivastava (Eds.), Lecture Notes in Artificial Intelligence Vol. 2637: Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'03) Seoul, Korea. Berlin/Heidelberg: Springer-Verlag, pages 265-270.[Pre-publication PDF][Link to paper via Springerlink] Abstract:On the basis of examining the existing restricted Bayesian network classifiers, a new Bayes-theorem-based and more strictly restricted Bayesian-network-based classification model DLBAN is proposed, which can be viewed as a double-level Bayesian network augmented naive Bayes classification. The experimental results show that the DLBAN classifier is better than the TAN classifier in the most cases.Keywords: Conditional Probability Estimation
99.Wang, Z., G.I. Webb, and F. Zheng (2003). Adjusting Dependence Relations for Semi-Lazy TAN Classifiers. In T.D. Gedeon and L.C.C. Fung (Eds.), Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03) Perth, Australia. Berlin/Heidelberg: Springer, pages 453-465.[Pre-publication PDF ][Link to paper via Springerlink] Abstract:The naive Bayesian classifier is a simple and effective classification method, which assumes a Bayesian network in which each attribute has the class label as its only one parent. But this assumption is not obviously hold in many real world domains. Tree-Augmented Nave Bayes (TAN) is a state-of-the-art extension of the naive Bayes, which can express partial dependence relations among attributes. In this paper, we analyze the implementations of two different TAN classifiers and their tree structures. Experiments show how different dependence relations impact on accuracy of TAN classifiers. We present a kind of semi-lazy TAN classifier, which builds a TAN identical to the original TAN at training time, but adjusts the dependence relations for a new test instance at classification time. Our extensive experimental results show that this kind of semi-lazy classifier delivers lower error than the original TAN and is more efficient than SuperParent TAN.Keywords: Conditional Probability Estimation
98.Butler, S. M., G.I. Webb, and R.A. Lewis (2003). A Case Study in Feature Invention for Breast Cancer Diagnosis Using X-Ray Scatter Images. In T.D. Gedeon and L.C.C. Fung (Eds.), Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03) Perth, Australia. Berlin/Heidelberg: Springer, pages 677-685.[Pre-publication PDF][Link to paper via Springerlink] Abstract:X-ray mammography is the current method for screening for breast cancer, and like any technique, has its limitations. Several groups have reported differences in the X-ray scattering patterns of normal and tumour tissue from the breast. This gives rise to the hope that X-ray scatter analysis techniques may lead to a more accurate and cost effective method of diagnosing beast cancer which lends itself to automation. This is a particularly challenging exercise due to the inherent complexity of the information content in X-ray scatter patterns from complex heterogenous tissue samples. We use a simple naive Bayes classier, coupled with Equal Frequency Discretization (EFD) as our classification system. High-level features are extracted from the low-level pixel data. This paper reports some preliminary results in the ongoing development of this classification method that can distinguish between the diffraction patterns of normal and cancerous tissue, with particular emphasis on the invention of features for classification.Keywords: Bioinformatics
97.Yang, Y. and G. I. Webb (2003). On Why Discretization Works for Naive-Bayes Classifiers. In T.D. Gedeon and L.C.C. Fung (Eds.), Lecture Notes in Artificial Intelligence Vol. 2903: Proceedings of the 16th Australian Conference on Artificial Intelligence (AI 03) Perth, Australia. Berlin/Heidelberg: Springer, pages 440-452.[Pre-publication PDF][Link to paper via Springerlink] Abstract:We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under discretization. We suggest that the use of different discretization techniques can affect the classification bias and variance of the generated classifiers, an effect named discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification errorKeywords: Discretization for Naive Bayes
96.Webb, G.I. (2003). Preliminary Investigations into Statistically Valid Exploratory Rule Discovery. In S.J. Simoff, G.J. Williams and M. Hegland (Eds.), Proceedings of the Second Australasian Data Mining Conference (AusDM03) Canberra, Australia. Sydney: University of Technology, pages 1-9.[Pre-publication PDF] Abstract:Exploratory rule discovery, as exemplified by association rule discovery, has proven very popular. In this paper I investigate issues surrounding the statistical validity of rules found using this approach and methods that might be employed to deliver statistically sound exploratory rule discovery.Keywords: Association Rule Discovery,statistically sound discovery, OPUS
95.Yang, Y. and G. I. Webb (2002). Non-Disjoint Discretization for Naive-Bayes Classifiers. In C. Sammut and A.G. Hoffmann (Eds.), Proceedings of the Nineteenth International Conference on Machine Learning (ICML '02) Sydney, Australia. San Francisco: Morgan Kaufmann, pages 666-673.[Pre-publication PDF] Abstract:Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of its discretized interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naive-Bayes classifiers, NDD works better than alternative discretization approaches.Keywords: Discretization for Naive Bayes
94.Pearce, J., G. I. Webb, R. Shaw, and B. Garner (2002). A Framework for Experimentation and Self Learning in Continuous Database Marketing. In Proceedings of the IEEE International Conference on Data Mining (ICDM-2002) Maebashi City, Japan. Los Alamitos, CA: IEEE Computer Society, pages 490-497.[PDF][Link to paper via IEEE xplore] Abstract:We present a method for continuous database marketing that identifies target customers for a number of marketing offers using predictive models. The algorithm then selects the appropriate offer for the customer. Experimental design principles are encapsulated to capture more information that will be used to monitor and refine the predictive models. The updated predictive models are then used for the next round of marketing offers.
93.Wang, Z. and G.I. Webb (2002). Comparison of Lazy Bayesian Rule Learning and Tree-Augmented Bayesian Learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM-2002) Maebashi City, Japan. Los Alamitos, CA: IEEE Computer Society, pages 775-778.[PDF][Link to paper via IEEE xplore] Abstract:The nave Bayes classifier is widely used in interactive applications due to its computational efficiency, direct theoretical base, and competitive accuracy. However, its attribute independence assumption can result in sub-optimal accuracy. A number of techniques have explored simple relaxations of the attribute independence assumption in order to increase accuracy. Among these, Lazy Bayesian Rules (LBR) and Tree-Augmented Nave-Bayes (TAN) have demonstrated strong prediction accuracy. However, their relative performance has never been evaluated. This paper compares and contrasts these two techniques, finding that they have comparable accuracy and hence should be selected according to computational profile. LBR is desirable when small numbers of objects are to be classified while TAN is desirable when large numbers of objects are to be classifiedKeywords: Conditional Probability Estimation
92.Brain, D. and G.I. Webb (2002). The Need for Low Bias Algorithms in Classification Learning From Large Data Sets. In Lecture Notes in Computer Science 2431: Principles of Data Mining and Knowledge Discovery: Proceedings of the Sixth European Conference (PKDD 2002) Helsinki, Finland. Berlin/Heidelberg: Springer-Verlag, pages 62-73.[Pre-publication PDF][Link to paper via Springerlink] Abstract:This paper reviews the appropriateness for application to large data sets of standard machine learning algorithms, which were mainly developed in the context of small data sets. Sampling and parallelization have proved useful means for reducing computation time when learning from large data sets. However, such methods assume that algorithms that were designed for use with what are now considered small data sets are also fundamentally suitable for large data sets. It is plausible that optimal learning from large data sets requires a different type of algorithm to optimal learning from small data sets. This paper investigates one respect in which data set size may affect the requirements of a learning algorithm the bias plus variance decomposition of classification error. Experiments show that learning from large data sets may be more effective when using an algorithm that places greater emphasis on bias management, rather than variance managementKeywords: Learning from large datasets
91.Frayman, Y., B. Rolfe, P. Hodgson, and G. I. Webb (2002). Predicting The Rolling Force in Hot Steel Rolling Mill using an Ensemble Model. In Proceedings of the Second IASTED International Conference on Artificial Intelligence and Applications (AIA '02) Benalmdena, Spain. Calgary, Canada: ACTA Press, pages 143-148.[Prepublication PDF] Abstract:Accurate prediction of the roll separating force is critical to assuring the quality of the final product in steel manufacturing. This paper presents an ensemble model that addresses these concerns. A stacked generalisation approach to ensemble modeling is used with two sets of the ensemble model members, the first set being learnt from the current input-output data of the hot rolling finishing mill, while another uses the available information on the previous coil in addition to the current information. Both sets of ensemble members include linear regression, multilayer perceptron, and k-nearest neighbor algorithms. A competitive selection model (multilayer perceptron) is then used to select the output from one of the ensemble members to be the final output of the ensemble model. The ensemble model created by such a stacked generalization is able to achieve extremely high accuracy in predicting the roll separation force with the average relative accuracy being within 1% of the actual measured roll force.Keywords: Engineering Applications
90.Rolfe, B., Y Frayman, P. Hodgson, and G. I. Webb (2002). Fault Detection in a Cold Forging Process Through Feature Extraction with a Neural Network. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2002) Benalmdena, Spain. Calgary, Canada: ACTA Press, pages 155-159.[Pre-publication PDF] Abstract:This paper investigates the application of neural networks to the recognition of lubrication defects typical to an industrial cold forging process employed by fastener manufacturers. The accurate recognition of lubrication errors, such as coating not being applied properly or damaged during material handling, is very important to the quality of the final product in fastener manufacture. Lubrication errors lead to increased forging loads and premature tool failure, as well as to increased defect sorting and the re-processing of the coated rod. The lubrication coating provides a barrier between the work material and the die during the drawing operation; moreover it needs be sufficiently robust to remain on the wire during the transfer to the cold forging operation. In the cold forging operation the wire undergoes multi-stage deformation without the application of any additional lubrication. Four types of lubrication errors, typical to production of fasteners, were introduced to a set of sample rods, which were subsequently drawn under laboratory conditions. The drawing force was measured, from which a limited set of features was extracted. The neural network based model learned from these features is able to recognize all types of lubrication errors to a high accuracy. The overall accuracy of the neural network model is around 98% with almost uniform distribution of errors between all four errors and the normal condition.Keywords: Engineering Applications
89.Webb, G. I. and S. Zhang (2002). Removing Trivial Associations in Association Rule Discovery. In Proceedings of the First International NAISO Congress on Autonomous Intelligent Systems (ICAIS 2002) Geelong, Australia. Canada/The Netherlands: NAISO Academic Press.[Pre-publication PDF] Abstract:Association rule discovery has become one of the most widely applied data mining strategies. Techniques for association rule discovery have been dominated by the frequent itemset strategy as exemplified by the Apriori algorithm. One limitation of this approach is that it provides little opportunity to detect and remove association rules on the basis of relationships between rules. As a result, the association rules discovered are frequently swamped with large numbers of spurious rules that are of little interest to the user. This paper presents association rule discovery techniques that can detect and discard one form of spurious association rule: trivial associations.Keywords: OPUS
88.Frayman, Y, B. Rolfe, and G. I. Webb (2002). Improving an Inverse Model of Sheet Metal Forming by Neural Network Based Regression. In Proceedings of the Design Engineering Technical Conferences and Computer and Information in Engineering Conference (DETC'02/ASME 2002). Montreal, Canada: ASME Press, pages 1-8.[Prepublication PDF] Abstract:The inverse model for a sheet metal forming process aims to determine the initial parameter levels required to form the final formed shape. This is a difficult problem that is usually approached by traditional methods such a finite element analysis. Formulating the problem as a classification problem makes is possible to use a well established classification algorithms such as decision trees. The classification is, however, generally based on a winner-takes-all approach when associating the output value with the corresponding class. On the other hand when formulating the problem as a regression task, all the output values are combined to produce the corresponding class value. For a multi-class problem, this may result in very different associations between the output of the model and the corresponding class. Such formulation makes it possible to use a well known regression algorithms such as neural networks.In this paper, we develop a neural network based inverse model of a sheet forming process, and compare its performance with that of a linear model. Both models are used in two modes: classification mode and a function estimation mode to investigate the advantage of re-formulating the problem as function estimation. This results in large improvements in the recognition rate of set-up parameters of a sheet metal forming process for both models, with a neural network model achieving much more accurate parameters recognition than a linear modelKeywords: Engineering Applications
87.Frayman, Y, B. Rolfe, and G. I. Webb (2002). Solving Regression Problems using Competitive Ensemble Models. In B. McKay and J.K. Slaney (Eds.), Lecture Notes in Computer Science Vol. 2557: Proceedings of the 15th Australian Joint Conference on Artificial Intelligence (AI 02) Canberra, Australia. Berlin/Heidelberg: Springer, pages 511-522.[Prepublication PDF][Link to paper via Springerlink] Abstract:The use of ensemble models in many problem domains has increased significantly in the last few years. The ensemble modelling, in particularly boosting, has shown a great promise in improving predictive performance of a model. Combining the ensemble members is normally done in a co-operative fashion where each of the ensemble members performs the same task and their predictions are aggregated to obtain the improved performance. However, it is also possible to combine the ensemble members in a competitive fashion where the best prediction of a relevant ensemble member is selected for a particular input. This option has been previously somewhat overlooked. The aim of this article is to investigate and compare the competitive and co-operative approaches to combining the models in the ensemble. A comparison is made between a competitive ensemble model and that of MARS with bagging, mixture of experts, hierarchical mixture of experts and a neural network ensemble over several public domain regression problems that have a high degree of nonlinearity and noise. The empirical results show a substantial advantage of competitive learning versus the co-operative learning for all the regression problems investigated. The requirements for creating the efficient ensembles and the available guidelines are also discussed.Keywords: Engineering Applications
86.Pearce, J., G. I. Webb, R. Shaw, and B. Garner (2002). A Systemic Approach to the Database Marketing Process.. In Proceedings of the Australian and New Zealand Marketing Academy Conference (ANZMAC 02) Geelong, Australia. Geelong, Victoria: Deakin University (CD Rom), pages pp 2941-2948.[PDF] Abstract:The role of database marketing (DBM) has become increasingly important for organisations that have large databases of information on customers with whom they deal directly. At the same time, DBM models used in practice have increased in sophistication. This paper examines a systemic view of DBM and the role of analytical techniques within DBM. It extends existing process models to develop a systemic model that encompasses the increased complexity of DBM in practice. The systemic model provides a framework to integrate data mining, experimental design and prioritisation decisions. This paper goes on to identify opportunities for research in DBM, including DBM process models used in practice, the use of evolutionary operations techniques in DBM, prioritisation decisions, and the factors that surround the uptake of DBM.
85.Yang, Y. and G. I. Webb (2002). A Comparative Study of Discretization Methods for Naive-Bayes Classifiers. In T. Yamaguchi, A. Hoffmann, H. Motoda and P. Compton (Eds.), Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW'02) Tokyo, Japan. Tokyo: Japanese Society for Artificial Intelligence, pages 159-173.[PDF] Abstract:Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), non-disjoint discretization (NDD) and weighted proportional k-interval discretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretization (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.Keywords: Discretization for Naive Bayes
84.Webb, G. I. and D. Brain (2002). Generality is Predictive of Predication Accuracy. In T. Yamaguchi, A. Hoffmann, H. Motoda and P. Compton (Eds.), Proceedings of the 2002 Pacific Rim Knowledge Acquisition Workshop (PKAW'02) Tokyo, Japan. Tokyo: Japanese Society for Artificial Intelligence, pages 117-130.[PDF] Abstract:There has been a dearth of research into the relative impacts of alternative high level learning biases. This paper presents two hypotheses about the expected impact of selecting between classification rules of differing levels of generality in the absence of other evidence about their likely relative performance on unseen data. It is argued that the accuracy on unseen data of the more general rule will tend to be closer to that of a default rule for the class than will that of the more specific rule. It is also argued that the accuracy on unseen cases of the more specific rule will tend to be closer to the accuracy obtained on training data than will the accuracy of the more general rule. Experimental evidence is provided in support of these hypotheses. We argue that these hypotheses can be of use in selecting appropriate learning biases to achieve specific learning objectives.Keywords: Occams Razor, Generality
83.Wang, Z. and G. I. Webb (2002). A Heuristic Lazy Bayesian Rules Algorithm. In S.J Simoff, G.J Williams and M. Hegland (Eds.), Proceedings of the First Australasian Data Mining Workshop (AusDM02) Canberra, Australia. Sydney: University of Technology, pages 57-63.[PDF] Abstract:Lazy Bayesian rule has demonstrated outstanding classification accuracy. However, it has high computational overheads when large numbers of instances are classified from a single training set. We compare lazy Bayesian rule and the tree-augmented Bayesian classifier, and present a new heuristic lazy Bayesian rule classifier that combines elements of the two. It requires less computation than lazy Bayesian rule, but demonstrates similar prediction accuracy.Keywords: Conditional Probability Estimation
82.Webb, G. I., J. Boughton, and Z. Wang (2002). Averaged One-Dependence Estimators: Preliminary Results. In S.J Simoff, G.J Williams and M. Hegland (Eds.), Proceedings of the First Australasian Data Mining Workshop (AusDM02) Canberra, Australia. Sydney: University of Technology, pages 65-73.[PDF] Abstract:Naive Bayes is a simple, computationally efficient and remarkably accurate approach to classification learning. These properties have led to its wide deployment in many online applications. However, it is based on an assumption that all attributes are conditionally independent given the class. This assumption leads to decreased accuracy in some applications. AODE overcomes the attribute independence assumption of naive Bayes by averaging over all models in which all attributes depend upon the class and a single other attribute. The resulting classification learning algorithm for nominal data is computationally efficient and achieves very low error rates.Keywords: Conditional Probability Estimation,AODE
81.Webb, G. I., M. J. Pazzani, and D. Billsus (2001). Machine learning for user modeling. User Modeling and User-Adapted Interaction 11. Netherlands: Springer, pages 19-20.[Pre-publication PDF][Paper via Springerlink] Abstract:At first blush, user modeling appears to be a prime candidate for straight forward application of standard machine learning techniques. Observations of the user's behavior can provide training examples that a machine learning system can use to form a model designed to predict future actions. However, user modeling poses a number of challenges for machine learning that have hindered its application in user modeling, including: the need for large data sets; the need for labelled data; concept drift; and computational complexity. This paper examines each of these issues and reviews approaches to resolving them.Keywords: Feature Based Modeling, User Modeling
80.Webb, G. I. (2001). Discovering Associations with Numeric Variables. In F. Provost and R. Srikant (Eds.), Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)[short paper] San Francisco, CA. New York: The Association for Computing Machinery, pages 383-388.[Pre-publication PDF][Link to paper via ACM Portal] Abstract:This paper further develops Aumann and Lindell's [3] proposal for a variant of association rules for which the consequent is a numeric variable. It is argued that these rules can discover useful interactions with numeric data that cannot be discovered directly using traditional association rules with discretization. Alternative measures for identifying interesting rules are proposed. Efficient algorithms are presented that enable these rules to be discovered for dense data sets for which application of Auman and Lindell's algorithm is infeasible.Keywords: Impact rules, OPUS
79.Yang, Y. and G.I. Webb (2001). Proportional K-Interval Discretization for Naive-Bayes Classifiers. In L. DeRaedt and P. A. Flach (Eds.), Lecture Notes in Computer Science 2167: Proceedings of the 12th European Conference on Machine Learning (ECML'01) Freiburg, Germany. Berlin/Heidelberg: Springer-Verlag, pages 564-575.[Pre-Publication PDF][Paper via Springerlink] Abstract:This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.Keywords: PKID
78.Wang, Z., G. I. Webb, and H. Dai (2001). Implementation of Lazy Bayesian Rules in the Weka System. In Software Technology Catering for 21st Century: Proceedings of the International Symposium on Future Software Technology (ISFST2001) Zheng Zhou, China. Tokyo: Software Engineers Association, pages 204-208. Abstract:The nave Bayesian classification algorithms were shown to be computationally efficient and surprisingly accurate when the conditional independence assumption on which they are based is violated. The lazy Bayesian rule is the application of lazy learning techniques to Bayesian tree induction, which supports a weaker conditional attribute independence assumption. The Weka system is a full, industrial-strength implementation of essentially almost the state-of-the-art machine learning techniques, and it contains a framework, in the form of a Java class library, which supports applications that use embedded machine learning and even the implementation of new learning schemes. In this paper, we mainly discuss the implementation of the algorithm of lazy Bayesian rule in Weka System, and introduce all the methods to be used in the Java class. This is the first lazy learning scheme implemented in Weka System.Keywords: Conditional Probability Estimation
77.Webb, G. I. (2001). Candidate Elimination Criteria for Lazy Bayesian. In M. Stumptner, D. Corbett and M.J. Brooks (Eds.), Lecture Notes in Computer Science Vol. 2256: Proceedings of the 14th Australian Joint Conference on Artificial Intelligence (AI'01) Adelaide, Australia. Berlin/Heidelberg: Springer, pages 545-556.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Lazy Bayesian Rules modify naive Bayesian classification to undo elements of the harmful attribute independence assumption. It has been shown to provide classification error comparable to boosting decision trees. This paper explores alternatives to the candidate elimination criterion employed within Lazy Bayesian Rules. Improvements over naive Bayes are consistent so long as the candidate elimination criteria ensures there is sufficient data for accurate probability estimation. However, the original candidate elimination criterion is demonstrated to provide better overall error reduction than the use of a minimum data subset size criterion.Keywords: machine learning
76.Webb, G.I. and S. Zhang (2001). Further Pruning for Efficient Association Rule Discovery. In M. Stumptner, D. Corbett and M.J. Brooks (Eds.), Lecture Notes in Computer Science Vol. 2256: Proceedings of the 14th Australian Joint Conference on Artificial Intelligence (AI'01) Adelaide, Australia. Berlin: Springer, pages 605-618.[Pre-publication PDF][Link to paper via Springerlink] Abstract:The Apriori algorithm's frequent itemset approach has become the standard approach to discovering association rules. However, the computation requirements of the frequent itemset approach are infeasible for dense data and the approach is unable to discover infrequent associations. OPUS\_AR is an efficient algorithm for rule discovery that does not utilize frequent itemsets and hence avoids these problems. It can reduce search time by using additional constraints on the search space as well as constraints on itemset frequency. However, the effectiveness of the pruning rules used during search will determine the efficiency of its search. This paper presents and analyzes pruning rules for use with OPUS\_AR. We demonstrate that application of OPUS\_AR is feasible for a number of datasets for which application of the frequent itemset approach is infeasible and that the new pruning rules can reduce compute time by more than 40%.Keywords: OPUS
75.Webb, G. I. (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning 40(2). Netherlands: Springer, pages 159-196.[Pre-publication PDF][Paper via Springerlink] Abstract:MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, Multi-boosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution.Keywords: Multiboosting, Boosting, Bias-variance
74.Zheng, Z. and G. I. Webb (2000). Lazy Learning of Bayesian Rules. Machine Learning 41(1). Netherlands: Springer, pages 53-84.[Pre-publication PDF][Link to paper via Springerlink] Abstract:The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute integradependencies for the local naive Bayesian classifier. However, Bayesian tree learning still suffers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called LBR. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of LBR are reasonable in a wide crossselection of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statistically significant, a selective naive Bayesian classifier.Keywords: Conditional Probability Estimation, Bayesian Learning, Lazy Bayesian Rules, Lazy Learning
73.Webb, G. I. (2000). Efficient Search for Association Rules. In R. Ramakrishnan and S. Stolfo (Eds.), Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000) Boston, MA. New York: The Association for Computing Machinery, pages 99-107.[Pre-publication PDF][Link to paper via ACM Portal] Abstract:This paper argues that for some applications direct search for association rules can be more efficient than the two stage process of the Apriori algorithm which first finds large item sets which are then used to identify associations. In particular, it is argued, Apriori can impose large computational overheads when the number of frequent itemsets is very large. This will often be the case when association rule analysis is performed on domains other than basket analysis or when it is performed for basket analysis with basket information augmented by other customer information. An algorithm is presented that is computationally efficient for association rule analysis during which the number of rules to be found can be constrained and all data can be maintained in memory.Keywords: Search, OPUS
72.Webb, G. I., J. Wells, and Z. Zheng (1999). An Experimental Evaluation of Integrating Machine Learning with Knowledge Acquisition. Machine Learning 35(1). Netherlands: Springer, pages 5-24.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Machine learning and knowledge acquisition from experts have distinct capabilities that appear to complement one another. We report a study that demonstrates the integration of these approaches can both improve the accuracy of the developed knowledge base and reduce development time. In addition, we found that users expected the expert systems created through the integrated approach to have higher accuracy than those created without machine learning and rated the integrated approach less difficult to use. They also provided favorable evaluations of both the specific integrated software, system called The Knowledge Factory, and of the general value of machine learning for knowledge acquisition.Keywords: Machine Learning with Knowledge Acquisition from Experts, Rule Learning
71.Webb, G. I. (1999). Decision Tree Grafting From The All Tests But One Partition. In T. Dean (Ed.), Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 99) Stockholm, Sweden. San Francisco: Morgan Kaufmann, pages 702-707.[PDF][Reproduced with permision of IJCAI Inc.] Abstract:Decision tree grafting adds nodes to an existing decision tree with the objective of reducing prediction error. A new grafting algorithm is presented that considers one set of training data only for each leaf of the initial decision tree, the set of cases that fail at most one test on the path to the leaf. This new technique is demonstrated to retain the error reduction power of the original grafting algorithm while dramatically reducing compute time and the complexity of the inferred tree. Bias/variance analysis reveal that the original grafting technique operated primarily by variance reduction while the new technique reduces both bias and variance.Keywords: Decision Tree Learning, Decision Tree Grafting, Occams Razor
70.Zheng, Z., G. I. Webb, and K. M. Ting (1999). Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees. In I. Bratko and S. Dzeroski (Eds.), Proceedings of the Sixteenth International Conference on Machine Learning (ICML-99) Bled, Slovenia. San Francisco: Morgan Kaufmann, pages 493-502.[Pre-publication PDF] Abstract:LBR is a lazy semi-naive Bayesian classifier learning technique, designed to alleviate the attribute interdependence problem of naive Bayesian classification. To classify a test example, it creates a conjunctive rule that selects a most appropriate subset of training examples and induces a local naive Bayesian classifier using this subset. LBR can significantly improve the performance of the naive Bayesian classifier. A bias and variance analysis of LBR reveals that it significantly reduces the bias of naive Bayesian classification at a cost of a slight increase in variance. It is interesting to compare this lazy technique with boosting and bagging, two well-known state-of-the-art non-lazy learning techniques. Empirical comparison of LBR with boosting decision trees on discrete valued data shows that LBR has, on average, significantly lower variance and higher bias. As a result of the interaction of these effects, the average prediction error of LBR over a range of learning tasks is at a level directly comparable to boosting. LBR provides a very competitive discrete valued learning technique where error minimization is the primary concern. It is very efficient when a single classifier is to be applied to classify few cases, such as in a typical incremental learning scenario.Keywords: Conditional Probability Estimation, Bayesian Learning, Lazy Bayesian Rules, Lazy Learning
69.Zheng, Z. and G.I. Webb (1999). Stochastic Attribute Selection Committees with Multiple Boosting: Learning More Accurate and More Stable Classifier Committees. In N. Zhong and L. Zhou (Eds.), Lecture Notes in Computer Science 1574: Methodologies for Knowledge Discovery and Data Mining - Proceedings of the Third Pacific-Asia Conference (PAKDD'99) Beijing, China. Berlin/Heidelberg: Springer-Verlag, pages 123-132.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Classifier learning is a key technique for KDD. Approaches to learning classifier committees, including Boosting, Bagging, SASC, and SASCB, have demonstrated great success in increasing the prediction accuracy curacy of decision trees. Boosting and Bagging create different classifiers by modifying the distribution of the training set. SASC adopts a different method. It generates committees by stochastic manipulation of the set of attributes considered at each node during tree induction, but keeping the distribution of the training set unchanged. SASCB, a combination of Boosting and SASC, has shown the ability to further increase, on average, the prediction accuracy of decision trees. It has been found that the performance of SASCB and Boosting is more variable than that of SASC, although SASCB is more accurate than the others on average. In this paper, we present a novel method to reduce variability of SASCB and Boosting, and further increase their average accuracy. It generates multiple committees by incorporating Bagging into SASCB. As well as improving stability and average accuracy, the resulting method is amenable to parallel or distributed processing, while Boosting and SascB are not. This is an important characteristic for datamining in large datasets.Keywords: Multiboosting, Boosting, Stochastic Attribute Selection committees
68.Newlands, D. and G. I. Webb (1999). Convex Hulls in Concept Induction. In N. Zhong and L. Zhou (Eds.), Lecture Notes in Computer Science 1574: Methodologies for Knowledge Discovery and Data Mining - Proceedings of the Third Pacific-Asia Conference (PAKDD'99) Beijing, China. Berlin/Heidelberg: Springer-Verlag, pages 306-316.[Link to paper via Springerlink] Abstract:This paper investigates modelling concepts as a few, large convex hulls rather than as many, small, axis-orthogonal divisions as is done by systems which currently dominate classification learning. It is argued that this approach produces classifiers which have less strong hypothesis language bias and which, because of the fewness of the concepts induced, are more understandable. The design of such a system is described and its performance is investigated. Convex hulls are shown to be a useful inductive generalisation technique offering rather different biases than well-known systems such as C4.5 and CN2. The types of domains where convex hulls can be usefully employed are described.
67.Chiu, B. C. and G. I. Webb (1999). Dual-Model: An Architecture for Utilizing Temporal Information in Student Modeling. In G. Cumming, T. Okamoto and L. Gomez (Eds.), Proceedings of the Seventh International Conference on Computers in Education (ICCE '99), volume 1 Chiba, Japan.(Also appeared in the Proceedings of ACAI Workshop W03: Machine Learning in User Modeling, pp 46-53). Amsterdam: IOS Press, pages 111-118.[Pre-Publication PDF] Abstract:A modeling system may be required to predict an agent's future actions even when confronted by inadequate or contradictory relevant evidence from observations of past actions. This can result in low prediction accuracy, or otherwise, low prediction rates, leaving a set of cases for which no predictions are made. This raises two issues. First, when maximizing prediction rate is preferable, what mechanisms can be employed such that a system can make more predictions without severely degrading prediction accuracy? Second, for contexts in which accuracy is of primary importance, how can we further improve prediction accuracy? A recently proposed Dual-model approach, which takes models' temporal characteristics into account, suggests a solution to the first problem, but leaves room for further improvement. This paper presents two classes of Dual-model variant. Each aims to achieve one of the above objectives. With the performance of the original system as a baseline, which does not utilize the temporal information, empirical evaluations in the domain of elementary subtraction show that one class of variant outperforms the baseline in prediction rate while the other does so in prediction accuracy, without significantly affecting other overall measures of the original performance. Keywords: Agent modeling, Student modeling, Temporal model, Decision tree.Keywords: Feature Based Modeling, User Modeling
66.Smith, P. and G. I. Webb (1999). Evaluation of Low-Level Program Visualisation for Teaching Novice C Programmers. In G. Cumming, T. Okamoto and L. Gomez (Eds.), Proceedings of the Seventh International Conference on Computers in Education (ICCE '99), volume 2 Chiba, Japan. Amsterdam: IOS Press, pages 385-392.[Pre-publication PDF] Abstract:While several program visualisation tools aimed at novice programmers have been developed over the past decade there is little empirical evidence showing that novices actually benefit from their use (Mulholland, 1995). Bradman (Smith & Webb, 1998) is a low-level program visualisation tool. We present an experiment that tests the efficacy of Bradman in assisting novice programmers learn programming concepts. We show that students with access to this lowlevel program visualisation tool achieved greater understanding of some programming concepts than those without access.
65.Ting, K.M. and Z. Zheng, & G. I. Webb (1999). Learning Lazy Rules to Improve the Performance of Classifiers. In F. Coenen and A. Macintosh (Eds.), Proceedings of the Nineteenth SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence (ES'99) Peterhouse College, Cambridge, UK. New York: Springer, pages 122-131.[Pre-publication PDF] Abstract:Based on an earlier study on lazy Bayesian rule learning, this paper introduces a general lazy learning framework, called LAZYRULE, that begins to learn a rule only when classifying a test case. The objective of the framework is to improve the performance of a base learning algorithm. It has the potential to be used for different types of base learning algorithms. LAZYRULE performs attribute elimination and training case selection using cross-validation to generate the most appropriate rule for each test case. At the consequent of the rule, it applies the base learning algorithm on the selected training subset and the remaining attributes to construct a classifier to make a prediction. This combined action seeks to build a better performing classifier for each test case than the classifier trained using all attributes and all training cases. We show empirically that LAZYRULE improves the performances of naive Bayesian classifiers and majority vote.Keywords: Conditional Probability Estimation
64.Brain, D. and G. I. Webb (1999). On The Effect of Data Set Size on Bias And Variance in Classification Learning. In D. Richards, G. Beydoun, A. Hoffmann and P. Compton (Eds.), Proceedings of the Fourth Australian Knowledge Acquisition Workshop (AKAW '99) Sydney, Australia. Sydney: The University of New South Wales, pages 117-128.[Pre-publication PDF] Abstract:With the advent of data mining, machine learning has come of age and is now a critical technology in many businesses. However, machine learning evolved in a different research context to that in which it now finds itself employed. A particularly important problem in the data mining world is working effectively with large data sets. However, most machine learning research has been conducted in the context of learning from very small data sets. To date most approaches to scaling up machine learning to large data sets have attempted to modify existing algorithms to deal with large data sets in a more computationally efficient and effective manner. But is this necessarily the best method? This paper explores the possibility of designing algorithms specifically for large data sets. Specifically, the paper looks at how increasing data set size affects bias and variance error decompositions for classification algorithms. Preliminary results of experiments to determine these effects are presented, showing that, as hypothesized variance can be expected to decrease as training set size increases. No clear effect of training set size on bias was observed. These results have profound implications for data mining from large data sets, indicating that developing effective learning algorithms for large data sets is not simply a matter of finding computationally efficient variants of existing learning algorithms.Keywords: Learning from large datasets
63.Chiu, B. C. and G. I. Webb (1998). Using Decision Trees For Agent Modelling: Improving Prediction Performance. User Modeling and User-Adapted Interaction 8(1-2). Netherlands: Springer, pages 131-152.[Pre-publication PDF][Paper via Springerlink] Abstract:A modeling system may be required to predict an agents future actions under constraints of inadequate or contradictory relevant historical evidence. This can result in low prediction accuracy, or otherwise, low prediction rates, leaving a set of cases for which no predictions are made. A previous study that explored techniques for improving prediction rates in the context of modeling students subtraction skills using Feature Based Modeling showed a tradeoff between prediction rate and predication accuracy. This paper presents research that aims to improve prediction rates without affecting prediction accuracy. The FBM-C4.5 agent modeling system was used in this research. However, the techniques explored are applicable to any Feature Based Modeling system, and the most effective technique developed is applicable to most agent modeling systems. The default FBM-C4.5 system models agents competencies with a set of decision trees, trained on all historical data. Each tree predicts one particular aspect of the agents action. Predictions from multiple trees are compared for consensus. FBM-C4.5 makes no prediction when predictions from different trees contradict one another. This strategy trades off reduced prediction rates for increased accuracy. To make predictions in the absence of consensus, three techniques have been evaluated. They include using voting, using a tree quality measure and using a leaf quality measure. An alternative technique that merges multiple decision trees into a single tree provides an advantage of producing models that are more comprehensible. However, all of these techniques demonstrated the previous encountered trade-off between rate of prediction and accuracy of prediction, albeit less pronounced. It was hypothesized that models built on more current observations would outperform models built on earlier observations. Experimental results support this hypothesis. A Dual-model system, which takes this temporal factor into account, has been evaluated. This fifth approach achieved a significant improvement in prediction rate without significantly affecting prediction accuracy.Keywords: Feature Based Modeling, User Modeling
62.Webb, G.I. and M. Kuzmycz (1998). Evaluation Of Data Aging: A Technique For Discounting Old Data During Student Modeling. In B.P. Goettl, H. M. Halff, C. Redfield and V. Shute (Eds.), Lecture Notes in Computer Science Vol. 1452: Proceedings of the Fourth International Conference on Intelligent Tutoring Systems (ITS '98) San Antonio, Texas. Berlin: Springer-Verlag, pages 384-393.[Pre-publication PDF][LNCS Volume via Springerlink] Abstract:Student modeling systems must operate in an environment in which a student's mastery of a subject matter is likely to change as a lesson progresses. A student model is formed from evaluation of evidence about the student's mastery of the domain. However, given that such mastery will change, older evidence is likely to be less valuable than recent evidence. Data aging addresses this issue by discounting the value of older evidence. This paper provides experimental evaluation of the effects of data aging. While it is demonstrated that data aging can result in statistically significant increases in both the number and accuracy of predictions that a modeling system makes, it is also demonstrated that the reverse can be true. Further, the effects experienced are of only small magnitude. It is argued that these results demonstrate some potential for data aging as a general strategy, but do not warrant employing data aging in its current form.Keywords: Feature Based Modeling, User Modeling
61.Viswanathan, M. and G.I. Webb (1998). Classification Learning Using All Rules. In C. Nedellec and C. Rouveiro (Eds.), Lecture Notes in Computer Science 1398: Proceedings of the Tenth European Conference on Machine Learning (ECML'98) Chemnitz, Germany. Berlin/Heidelberg: Springer, pages 149-159.[Pre-publication PDF] Abstract:The covering algorithm has been ubiquitous in the induction of classification rules. This approach to machine learning uses heuristic search that seeks to find a minimum number of rules that adequately explains the data. However, recent research has provided evidence that learning redundant classifiers can increase predictive accuracy. Learning all possible classifiers seems to be a plausible form of this nomination of redundant classifiers. This paper presents an algorithm that in effect learns all classifiers. Preliminary investigations by Webb (1996b) suggest that a heuristic covering algorithm in general learns classification rules with higher predictive accuracy than those learned by this new approach. In this paper we present an extensive empirical comparison between the learning-all-rules algorithm and three varied established approaches to inductive learning, namely a covering algorithm, an instance-based learner and a decision tree learner. Empirical evaluation provides strong evidence in support of learning-all-rules as a plausible approach to inductive learning.Keywords: Lazy Learning, Rule Learning
60.Smith, P. and G. I. Webb (1998). Overview of a Low-Level Program Visualisation Tool for Novice Programmers. In Proceedings of the Sixth International Conference on Computers in Education (ICCE '98) Beijing. Berlin: Springer-Verlag, pages 213-216.[Pre-publication PDF] Abstract:As a programming novice attempts to attain expertise in programming she must develop adequate mental models and knowledge structures of the programming process. Unfortunately, many of the computerised tools to which novice programmers have access are designed by expert programmers for experts and as such do not meet the needs of novices. Low-level program visualisation tools make explicit the internal workings of program execution and as such can serve as conceptual models onto which novices can assimilate information about programming. This paper discusses the need for such a tool, what features such a tool may include and gives a brief description of an evaluation of a low-level program visualisation tool developed at Deakin University.
59.Zheng, Z., G. I. Webb, and K. M. Ting (1998). Integrating Boosting and Stochastic Attribute Selection Committees for Further Improving The Performance of Decision Tree Learning. In Proceedings of the Tenth IEEE International Conference on Tools with Artificial Intelligence (ICTAI98) Taipei, Taiwan. Los Alamitos, CA: IEEE Computer Society Press, pages 216-223.[PDF][Link to paper via IEEE xplore] Abstract:Techniques for constructing classifier committees including boosting and bagging have demonstrated great success, especially boosting for decision tree learning. This type of technique generates several classifiers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the final classification. Boosting and bagging create different classifiers by modifying the distribution of the training set. SASC (Stochastic Attribute Selection Committees) uses an alternative approach to generating classifier committees by stochastic manipulation of the set of attributes considered at each node during tree induction, but keeping the distribution of the training set unchanged. We propose a method for improving the performance of boosting. This technique combines boosting and SASC. It builds classifier committees by manipulating both the distribution of the training set and the set of attributes available during induction. In the synergy SASC effectively increases the model diversity of boosting. Experiments with a representative collection of natural domains show that, on average, the combined technique outperforms either boosting or SASC alone in terms of reducing the error rate of decision tree learning.Keywords: Multiboosting, Boosting, Stochastic Attribute Selection Committees
58.Zheng, Z. and G. I. Webb (1998). Multiple Boosting: A Combination of Boosting and Bagging. In Proceedings of the 1998 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98) Las Vegas, Nevada. CSREA Press, pages 1133-1140.[Pre-publication PDF] Abstract:Classifier committee learning approaches have demonstrated great success in increasing the prediction accuracy of classifier learning, which is a key technique for datamining. These approaches generate several classifiers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the final classification. It has been shown that Boosting and Bagging, as two representative methods of this type, can significantly decrease the error rate of decision tree learning. Boosting is generally more accurate than Bagging, but the former is more variable than the latter. In addition, bagging is amenable to parallel or distributed processing, while Boosting is not. In this paper, we study a new committee learning algorithm, namely MB (Multiple Boosting). It creates multiple subcommittees by combining Boosting and Bagging. Experimental results in a representative collection of natural domains show that MB is, on average, more accurate than either Bagging or Boosting alone. It is more stable than Boosting, and is amenable to parallel or distributed processing. These characters characteristics make MB a good choice for parallel datamining ing.Keywords: MultiBoosting
57.Webb, G. I. (1998). The Problem of Missing Values in Decision Tree Grafting. In G. Antoniou and J.K. Slaney (Eds.), Lecture Notes in Computer Science Vol. 1502: Advanced Topics in Artificial Intelligence, Selected Papers from the Eleventh Australian Joint Conference on Artificial Intelligence (AI '98) Brisbane, Australia. Berlin: Springer-Verlag, pages 273-283.[Pre-publication PDF] Abstract:Decision tree grafting adds nodes to inferred decision trees. Previous research has demonstrated that appropriate grafting techniques can improve predictive accuracy across a wide crossselection of domains. However, previous decision tree grafting systems are demonstrated to have a serious deficiency for some data sets containing missing values. This problem arises due to the method for handling missing values employed by C4.5, in which the grafting systems have been embedded. This paper provides an explanation of and solution to the problem. Experimental evidence is presented of the efficacy of this solution.Keywords: Decision Tree Learning, Decision Tree Grafting, Occams Razor
56.Webb, G. I. and M. Pazzani (1998). Adjusted Probability Naive Bayesian Induction. In G. Antoniou and J.K. Slaney (Eds.), Lecture Notes in Computer Science Vol. 1502: Advanced Topics in Artificial Intelligence, Selected Papers from the Eleventh Australian Joint Conference on Artificial Intelligence (AI '98) Brisbane, Australia. Berlin: Springer-Verlag, pages 285-295.[Pre-publication PDF] Abstract:Naive Bayesian classifiers utilise a simple mathematical model for induction. While it is known that the assumptions on which this model is based are frequently violated, the predictive accuracy obtained in discriminate classification tasks is surprisingly competitive in comparison to more complex induction techniques. Adjusted probability naive Bayesian induction adds a simple extension to the naive Bayesian classifier. A numeric weight is inferred for each class. During discriminate classification, the naive Bayesian probability of a class is multiplied by its weight to obtain an adjusted value. The use of this adjusted value in place of the naive Bayesian probability is shown to significantly improve predictive accuracy.Keywords: Bayesian Learning
55.Zheng, Z. and G. I. Webb (1998). Stochastic Attribute Selection Committees. In G. Antoniou and J.K. Slaney (Eds.), Lecture Notes in Computer Science Vol. 1502: Advanced Topics in Artificial Intelligence, Selected Papers from the Eleventh Australian Joint Conference on Artificial Intelligence (AI '98) Brisbane, Australia. Berlin: Springer-Verlag, pages 321-332.[Pre-publication PDF] Abstract:Classifier committee learning methods generate multiple classifiers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the final classification. Two such methods, Bagging and Boosting, have shown great success with decision tree learning. They create different classifiers by modifying the distribution of the training set. This paper studies a different approach: Stochastic Attribute Selection Committee learning of decision trees. It generates classifier committees by stochastically modifying the set of attributes but keeping the distribution of the training set unchanged. An empirical evaluation of a variant of this method, namely Sasc, in a representative collection of natural domains shows that the SASC method can significantly reduce the error rate of decision tree learning. On average Sasc is more accurate than Bagging and less accurate than Boosting, although a one-tailed signtest fails to show that these differences are significant at a level of 0.05. In addition, it is found that, like Bagging, Sasc is more stable than Boosting in terms of less frequently obtaining significantly higher error rates than C4.5 and, when error is raised, producing lower error rate increases. Moreover, like Bagging, Sasc is amenable to parallel and distributed processing while Boosting is not.Keywords: MultiBoosting, Stochastic Attribute Selection Committees
54.Webb, G. I., B. C. Chiu, and M. Kuzmycz (1997). Comparative Evaluation of Alternative Induction Engines for Feature Based Modelling. International Journal of Artificial Intelligence in Education 8. NAmsterdam: IOS Press, pages 97-115.[Link to paper via IJAIED] Abstract:Feature Based Modelling has demonstrated the ability to produce agent models with high accuracy in predicting an agent's future actions. There are a number of respects in which this modelling technique is novel. However, there has been no previous analysis of which aspects of the approach are responsible for its performance. One distinctive feature of the approach is a purpose built induction module. This paper presents a study in which the original custom built Feature Based Modelling induction module was replaced by the C4.5 machine learning system. Comparative evaluation shows that the use of C4.5 increases the number of predictions made without significantly altering the accuracy of those predictions. This suggests that it is the general input-output agent modelling methodology used with both systems that has primary responsibility for the high predictive accuracy previously reported for Feature Based Modelling, rather than its initial idiosyncratic induction technique.Keywords: Feature Based Modeling, User Modeling
53.Webb, G. I. (1997). Decision Tree Grafting. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 97) Nagoya, Japan. San Francisco: Morgan Kaufmann, pages 846-851.[PDF][Reproduced with permision of IJCAI Inc.] Abstract:This paper extends recent work on decision tree grafting. Grafting is an inductive process that adds nodes to inferred decision trees. This process is demonstrated to frequently improve predictive accuracy. Superficial analysis might suggest that decision tree grafting is the direct reverse of pruning. To the contrary, it is argued that the two processes are complementary. This is because, like standard tree growing techniques, pruning uses only local information, whereas grafting uses non-local information. The use of both pruning and grafting in conjunction is demonstrated to provide the best general predictive accuracy over a representative selection of learning tasks.Keywords: Decision Trees, Decision Tree Grafting, Occams Razor
52.Chiu, B.C., G.I. Webb, and M. Kuzmycz (1997). A Comparison of First-Order and Zeroth-Order Induction for Input-Output Agent Modelling. In A. Jameson, C. Paris and C. Tasso (Eds.), Proceedings of the Sixth International Conference on User Modeling (UM'97) Chia Laguna, Sardinia. New York/Vienna: Springer, pages 347-358.[Paper via UM.org] Abstract:Most student modelling systems seek to develop a model of the internal operation of the cognitive system. In contrast, Input-Output Agent Modelling (IOAM) models an agent in terms of relationships between the inputs and outputs of the cognitive system. Previous IOAM systems have demonstrated high predictive accuracy in the domain of elementary subtraction. These systems use zeroth-order induction. Many of the predicates used, however, represent relations. This suggests that first-order induction might perform well in this domain. This paper reports a study in which zeroth-order and first-order induction engines were used to build models of student subtraction skills. Comparative evaluation shows that zeroth-order induction performs better than first-order in detecting regularities indicating misconceptions while first-order induction leads zeroth-order in detecting regularities indicating correct concepts and inducing a more comprehensible student model. This suggests there exists a trade-off between these factors and that there is still scope for improvement.Keywords: Feature Based Modeling, User Modeling
51.Chiu, B. C., G. I. Webb, and Z. Zheng (1997). Using Decision Trees for Agent Modelling: A Study on Resolving Conflicting Predictions. In A. Sattar (Ed.), Lecture Notes in Computer Science Vol. 1342: Proceedings of the Tenth Australian Joint Conference on Artificial Intelligence (AI'97) Perth, Australia. Berlin: Springer-Verlag, pages 349-358.[Pre-Publication PDF] Abstract:Input-Output Agent Modelling (IOAM) is an approach to modelling an agent in terms of relationships between the inputs and outputs of the cognitive system. This approach, together with a leading inductive learning algorithm, C4.5, has been adopted to build a subtraction skill modeller, C4.5-IOAM. It models agents' competencies with a set of decision trees. C4.5-IOAM makes no prediction when predictions from different decision trees are contradictory. This paper proposes three techniques for resolving such situations. Two techniques involve selecting the more reliable prediction from a set of competing predictions using a free quality measure and a leaf quality measure. The other technique merges multiple decision trees into a single tree. This has the additional advantage of producing more comprehensible models. Experimental results, in the domain of modelling elementary subtraction skills, showed that the tree quality and the leaf quality of a decision path provided valuable references for resolving contradicting predictions and a single tree model representation performed nearly equally well to the multi-tree model representation.Keywords: Feature Based Modeling, User Modeling
50.Webb, G. I. (1996). Further Experimental Evidence Against The Utility Of Occams Razor. Journal of Artificial Intelligence Research 4. Menlo Park, CA: AAAI Press, pages 397-417.[Link to paper via JAIR website] Abstract:This paper presents new experimental evidence against the utility of Occam's razor. A systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning.Keywords: Decision Trees, Decision Tree Grafting, Occams Razor
49.Webb, G.I. and M. Kuzmycz (1996). Feature Based Modelling: A Methodology for Producing Coherent, Consistent, Dynamically Changing Models of Agents Competencies. User Modelling and User-Adapted Interaction 5(2). Netherlands: Springer, pages 117-150.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Feature Based Modelling uses attribute value machine learning techniques to model an agent's competency. This is achieved by creating a model describing the relationships between the features of the agent's actions and of the contexts in which those actions are performed. This paper describes techniques that have been developed for creating these models and for extracting key information therefrom. An overview is provided of previous studies that have evaluated the application of Feature Based Modelling in a number of educational contexts including piano keyboard playing, the unification of Prolog terms and elementary subtraction. These studies have demonstrated that the approach is applicable to a wide spectrum of domains. Classroom use has demonstrated the low computational overheads of the technique. A new study of the application of the approach to modelling elementary subtraction skills is presented. The approach demonstrates accuracy in excess of 90% when predicting student solutions. It also demonstrates the ability to identify and model student's buggy arithmetic procedures.Keywords: Feature Based Modeling
48.Webb, G. I. (1996). Integrating Machine Learning With Knowledge Acquisition Through Direct Interaction With Domain Experts. Knowledge-Based Systems 9. Elsevier, pages 253-266.[Pre-publication PDF][Paper via Science Direct] Abstract:Knowledge elicitation from experts and empirical machine learning are two distinct approaches to knowledge acquisition with differing and mutually complementary capabilities. Learning apprentices have provided environments in which a knowledge engineer may collaborate with a machine learning system allowing, for a synergy between the complementary approaches. The Knowledge Factory is a knowledge acquisition environment that allows a domain expert to collaborate directly with a machine learning system without the need for assistance from a knowledge engineer. This requires a different form of environment to the learning apprentice. This paper describes techniques for supporting such interactions and their implementation in a knowledge acquisition environment called The Knowledge Factory.Keywords: Machine Learning with Knowledge Acquisition from Experts, Rule Learning
47.Webb, G. I. (1996). Cost Sensitive Specialisation. In N.Y. Foo and R. Goebel (Eds.), Lecture Notes in Computer Science Vol. 1114. Topics in Artificial Intelligence: Proceedings of the Fourth Pacific Rim International Conference on Artificial Intelligence (PRICAI'96) Cairns, Australia. Berlin/Heidelberg: Springer-Verlag, pages 23-34.[Pre-publication PDF][Link to paper via Springerlink] Abstract:Cost-sensitive specialization is a generic technique for misclassification cost sensitive induction. This technique involves specializing aspects of a classifier associated with high misclassification costs and generalizing those associated with low misclassification costs. It is widely applicable and simple to implement. It could be used to augment the effect of standard cost-sensitive induction techniques. It should directly extend to test application cost sensitive induction tasks. Experimental evaluation demonstrates consistent positive effects over a range of misclassification cost sensitive learning tasks.Keywords: Cost Sensitive Learning, Generality
46.Webb, G. I. (1996). Inclusive Pruning: A New Class of Pruning Rule for Unordered Search and its Application to Classification Learning. In K. Ramamohanarao (Ed.), Australian Computer Science Communications Vol. 18 (1): Proceedings of the Nineteenth Australasian Computer Science Conference (ACSC'96) Royal Melbourne Insitute of Technology, Australia. Melbourne: ACS, pages 1-10.[PDF] Abstract:This paper presents a new class of pruning rule for unordered search. Previous pruning rules for unordered search identify operators that should not be applied in order to prune nodes reached via those operators. In contrast, the new pruning rules identify operators that should be applied and prune nodes that are not reached via those operators. Specific pruning rules employing both these approaches are identified for classification learning. Experimental results demonstrate that application of the new pruning rules can reduce by more than 60% the number of states from the search space that are considered during classification learning.Keywords: Search, Rule Learning, OPUS
45.Webb, G.I. and J. Wells (1996). An Experimental Evaluation of Integrating Machine Learning with Knowledge Acquisition Through Direct Interaction with Domain Experts. In P. Compton, R. Mizoguchi, H. Motada and T. Menzies (Eds.), Proceedings of the 1996 Pacific Knowledge Acquisition Workshop (PKAW'96) Coogee, Sydney, Australia. Sydney: UNSW Press, pages 170-189.[Pre-publication PDF] Abstract:Machine learning and knowledge acquisition from experts have distinct and apparently complementary knowledge acquisition capabilities. This study demonstrates that the integration of these approaches can both improve the accuracy of the knowledge base that is developed and reduce the time taken to develop it. The system studied, called The Knowledge Factory is distinguished by the manner in which it supports direct interaction with domain experts with little or no knowledge engineering expertise. The benefits reported relate to use by such users. In addition to the improved quality of the knowledge base, in questionnaire responses the users provided favourable evaluations of the integration of machine learning with knowledge acquisition within the system.
44.Webb, G. I. (1996). A Heuristic Covering Algorithm Outperforms Learning All Rules. In Proceedings of Information, Statistics and Induction in Science (ISIS '96) Melbourne, Australia. Singapore: World Scientific, pages 20-30.[Pre-publication PDF] Abstract:The induction of classification rules has been dominated by a single generic technique-the covering algorithm. This approach employs a simple hill-climbing search to learn sets of rules. Such search is subject to numerous widely known deficiencies. Further, there is a growing body of evidence that learning redundant sets of rules can improve predictive accuracy. The ultimate end-point of a move toward learning redundant rule sets would appear to be to learn and employ all possible rules. This paper presents a learning system that does this. An empirical investigation shows that, while the approach often achieves higher predictive accuracy than a covering algorithm, the covering algorithm outperforms induction of all rules significantly more frequently. Preliminary analysis suggests that learning all rules performs well when the training set clearly defines the decision surfaces but that the heuristic covering algorithm performs better when the decision surfaces are not clearly delineated by the training examples.Keywords: Lazy Learning, Rule Learning
43.Webb, G. I. (1995). OPUS: An Efficient Admissible Algorithm For Unordered Search. Journal of Artificial Intelligence Research 3. Menlo Park, CA: AAAI Press, pages 431-465.[Link to paper via JAIR website] Abstract:OPUS is a branch and bound search algorithm that enables efficient admissible search through spaces for which the order of search operator application is not significant. The algorithm's search efficiency is demonstrated with respect to very large machine learning search spaces. The use of admissible search is of potential value to the machine learning community as it means that the exact learning biases to be employed for complex learning tasks can be precisely specified and manipulated. OPUS also has potential for application in other areas of artificial intelligence, notably, truth maintenance.Keywords: Search, Rule Learning, OPUS
42.Newlands, D. and G. I. Webb (1995). Polygonal Inductive Generalisation System. In G. Forsyth and M. Ali (Eds.), Proceedings of the Eighth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE '95) Melbourne, Australia. Newark, NJ, USA: Gordon and Breach Science Publishers, Inc, pages 587-592.[Pre-publication PDF][Link to paper via ACM Portal] Abstract:Knowledge acquisition remains one of the primary constraints on the development of expert systems. A number of researchers have explored methods for allowing a machine learning system to assist a knowledge engineer in knowledge acquisition. In contrast, we are exploring methods for enabling an expert to directly interact with a machine learning system to collaborate during knowledge acquisition. We report recent extensions to our methodology encompassing a revised model of the role of machine learning in knowledge acquisition; techniques for communication between a machine learning system and a domain expert and novel forms of assistance that a machine learning system may provide to an expert.Keywords: Convex Hulls
41.Webb, G. I. and J. Wells (1995). Recent Progress in Machine-Expert Collaboration for Knowledge Acquisition. In X. Yao (Ed.), Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (AI'95) Canberra, Australia. Singapore: World Scientific, pages 291-298.[Prepublication PDF] Abstract:Knowledge acquisition remains one of the primary constraints on the development of expert systems. A number of researchers have explored methods for allowing a machine learning system to assist a knowledge engineer in knowledge acquisition. In contrast, we are exploring methods for enabling an expert to directly interact with a machine learning system to collaborate during knowledge acquisition. We report recent extensions to our methodology encompassing a revised model of the role of machine learning in knowledge acquisition; techniques for communication between a machine learning system and a domain expert and novel forms of assistance that a machine learning system may provide to an expert.Keywords: Machine Learning with Knowledge Acquisition from Experts
40.Smith, P. and G. I. Webb (1995). Transparency Debugging with Explanations for Novice Programmers. In M. Ducass (Ed.), Proceedings of the Second International Workshop on Automated and Algorithmic Debugging (AADEBUG'95) Saint-Malo, France. IRISA-CNRS.[Pre-publication PDF] Abstract:Novice programmers often find programming to be a difficult and frustrating task. Because of their lack of experience in programming novices have different needs to experts when it comes to debugging assistants. One way a debugging assistant could be tailored to novices, as proposed by Eisenstadt, is to provide them with an explicit model of how their program works and, hence encourage them to find errors for themselves. We discuss such a transparency debugger, Bradman, that we have been developing to assist novice programmers understand and debug their C programs. We also present the results of an experiment, conducted on volunteer novice programmers, in which approximately half of the students had access to an explanation of each statement as it was executed and the other half did not. We show that access to such explanations provided beneficial results for a significant number of students.Keywords: Computer Science Education
39.Smith, P. and G. I. Webb (1995). Reinforcing a Generic Computer Model for Novice Programmers. In Proceedings of the Seventh Australian Society for Computers in Learning in Tertiary Education Conference (ASCILITE '95) Melbourne, Australia. Melbourne: ASCILITE.[Pre-publication PDF][Link to paper via ASCLITE] Abstract:Novices often find learning their first programming language to be a frustrating and difficult process. They have difficulties in developing and debugging their programs. One of their problems is that their mental model of how the computer works is inadequate. In this paper we discuss a programming assistant, called Bradman, which we are currently developing. It is aimed at novice programmers and designed to reinforce a concrete mental model of how the computer works as a program is executed. It shows explicitly how program states change as statements in the procedural language C are executed. It does this by means of graphical display together with contextualised verbal explanations of each statement.Keywords: Computer Science Education
38.Yip, S. and G. I. Webb (1994). Incorporating Canonical Discriminate Attributes in Classification Learning. In R. Elio (Ed.), Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference(AI-94) Banff, Canada. San Francisco: Morgan Kaufmann, pages 63-70.[Pre-publication PDF] Abstract:This paper describes a method for incorporating canonical discriminant attributes in classification machine learning. Though decision trees and rules have semantic appeal when building expert systems, the merits of discriminant analysis are well documented. For data sets on which discriminant analysis obtains significantly better predictive accuracy than symbolic machine learning, the incorporation of canonical discriminant attributes can benefit machine learning. The process starts by applying canonical discriminant analysis to the training set. The canonical discriminant attributes are included as additional attributes. The expanded data set is then subjected to machine learning. This enables linear combinations of numeric attributes to be incorporated in the classifiers that are learnt. Evaluation on the data sets on which discriminant analysis performs better than most machine learning systems, such as the Iris flowers and Waveform data sets, shows that incorporating the power of discriminant analysis in machine classification learning can significantly improve the predictive accuracy and reduce the complexity of classifiers induced by machine learning systems.Keywords: Constructive Induction
37.Webb, G. I. (1994). Recent Progress in Learning Decision Lists by Prepending Inferred Rules. In Proceedings of the Second Singapore International Conference on Intelligent Systems (SPICIS 94) Singapore. Singapore: Asia Computer Weekly, pages B280-B285.[Pre-publication PDF] Abstract:This paper describes a new algorithm for learning decision lists that operates by prepending successive rules to the front of the list under construction. By contrast, the classic algorithm operates by appending successive rules to the end of the decision list under construction. The new algorithm is demonstrated in the majority of cases to produce smaller classifiers that provide improved predictive accuracy in less time than the classic algorithm.Keywords: Rule Learning, Prepend
36.Webb, G. I. (1994). Generality Is More Significant Then Complexity: Toward An Alternative To Occams Razor. In C. Zhang, J. Debenham and D. Lukose (Eds.), Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94) Armidale,NSW, Australia. Singapore: World Scientific, pages 60-67.[Pre-publication PDF] Abstract:Occam's Razor is widely employed in machine learning to select between classifiers with equal empirical support. This paper presents the theorem of decreasing inductive power: that, all other things being equal, if two classifiers a and b cover identical cases from the training set and a is a generalisation of b, a has higher probability than b of misclassifying a previously unsighted case. This theorem suggests that, to the contrary of Occam's Razor, generality, not complexity, should be used to select between classifiers with equal empirical support. Two studies are presented. The first study demonstrates that the theorem of decreasing inductive power holds for a number of commonly studied learning problems and for a number of different means of manipulating classifier generality. The second study demonstrates that generality provides a more consistent indicator of predictive accuracy in the context of a default rule than does complexity. These results suggest that the theorem of decreasing predictive power provides a suitable theoretical framework for the development of learning biases for use in selecting between classifiers with identical empirical supportKeywords: Occams Razor, Rule Learning, Generality
35.Yip, S. and G. I. Webb (1994). Empirical Function Attribute Construction in Classification Learning. In C. Zhang, J. Debenham and D. Lukose (Eds.), Artificial Intelligence: Sowing the Seeds for the Future, Proceedings of Seventh Australian Joint Conference on Artificial Intelligence (AI'94) Armidale,NSW, Australia. Singapore: World Scientific, pages 29-36.[Pre-publication PDF] Abstract:The merits of incorporating feature construction to assist selective induction in learning hard concepts are well documented. This paper introduces the notion of function attributes and reports a method of incorporating functional regularities in classifiers. Training sets are preprocessed with this method before submission to a selective induction classification learning system. The method, referred to as FAFA (function attribute finding), is characterised by finding bivariate functions that contribute to the discrimination between classes and then transforming them to function attributes as additional attributes of the data set. The value of each function attribute equals the deviation of each example from the value obtained by applying that function to the example. The expanded data set is then submitted to classification learning. Evaluation with published and artificial data shows that this method can improve classifiers in terms of predictive accuracy and complexity.Keywords: Constructive Induction
34.Webb, G. I. (1993). Feature Based Modelling: A Methodology for Producing Coherent, Consistent, Dynamically Changing Models of Agents Competency. In P. Brna, S. Ohlsson and H. Pain (Eds.), Proceedings of the 1993 World Conference on Artificial Intelligence in Education (AI-ED'93) Edinburgh, Scotland. Also published in User Modeling and User-Adapted Interaction. 5: 117-150, 1996. Charlottesville, VA: AACE, pages 497-504.[Pre-publication PDF] Abstract:Feature Based Modelling uses attribute value machine learning techniques to model an agent's competency. This is achieved by creating a model describing the relationships between the features of the agent's actions and of the contexts in which those actions are performed. This paper describes techniques that have been developed for creating these models and for extracting key information therefrom. An overview is provided of previous studies that have evaluated the application of Feature Based Modelling in a number of educational contexts including piano keyboard playing, the unification of Prolog terms and elementary subtraction. These studies have demonstrated that the approach is applicable to a wide spectrum of domains. Classroom use has demonstrated the low computational overheads of the technique. A new study of the application of the approach to modelling elementary subtraction skills is presented. The approach demonstrates accuracy in excess of 90\% when predicting student solutions. It also demonstrates the ability to identify and model students' buggy arithmetic procedures.Keywords: Feature Based Modeling, Case Based Learning
33.Webb, G. I. (1993). Systematic Search for Categorical Attribute-Value Data-Driven Machine Learning. In C. Rowles, H. Liu and N. Foo (Eds.), Proceedings of the Sixth Australian Joint Conference on Artificial Intelligence (AI'93) Melbourne, Australia. Singapore: World Scientific, pages 342-347.[Pre-publication PDF] Abstract:Optimal Pruning for Unordered Search is a search algorithm that enables complete search through the space of possible disjuncts at the inner level of a covering algorithm. This algorithm takes as inputs an evaluation function, e, a training set, t, and a set of specialisation operators, o. It outputs a set of operators from o that creates a classifier that maximises e with respect to t. While OPUS has exponential worst case time complexity, the algorithm is demonstrated to reach solutions for complex real world domains within reasonable time frames. Indeed, for some domains, the algorithm exhibits greater computational efficiency than common heuristic search algorithms.Keywords: Search, Rule Learning, OPUS
32.Webb, G.I. (1993). DLGref2: Techniques for Inductive Rule Refinement. In Proceedings of the 1993 IJCAI Workshop W16: Machine Learning and Knowledge Acquisition Chambery, France., pages 236-252.[Pre-publication PDF] Abstract:This paper describes and evaluates machine learning techniques for knowledge-base refinement. These techniques are central to Einstein, a knowledge acquisition system that enables a human expert to collaborate with a machine learning system at all stages of the knowledge-acquisition cycle. Experimental evaluation demonstrates that the knowledge-base refinement techniques are able to significantly increase the accuracy of nontrivial expert systems in a wide variety of domains.Keywords: Rule Learning
31.Webb, G. I. (1993). Control, Capabilities and Communication: Three Key Issues for Machine-Expert Collaborative Knowledge Acquisition. In N. Aussenac, G. Boy, B. Gaines, M. Linster, J.G. Ganascia and Y. Kodratoff (Eds.), Proceedings (Complement) of the Seventh European Workshop on Knowledge Acquisition for Knowledge-based Systems (EWKA'93) Toulouse, France. Springer-Verlag, pages 263-275.[Pre-publication PDF] Abstract:Machine learning and knowledge elicitation are different but complementary approaches to knowledge acquisition. On the face of it there are large potential gains to be reaped from the integration of these two knowledge acquisition techniques. Machine-expert collaborative knowledge acquisition combines these approaches by placing the machine learning system and the human expert as partners in the knowledge-acquisition task. This paper examines three key issues facing machine-expert collaborative knowledge-acquisition where should control reside, what capabilities should each partner bring to the task and how should the partners communicate?Keywords: Machine Learning with Knowledge Acquisition from Experts
30.Webb, G. I. and N. Brkic (1993). Learning Decision Lists by Prepending Inferred Rules. In S. Sestito (Ed.), Proceedings of the AI 93 Workshop on Machine Learning and Hybrid Systems Melbourne, Australia., pages 6-10.[Pre-publication PDF] Abstract:This paper describes a new algorithm for learning decision lists that operates by prepending successive rules to front of the list under construction. This contrasts with the original decision list induction algorithm which operates by appending successive rules to end of the list under construction. The new algorithm is demonstrated in the majority of cases to produce smaller classifiers that provide improved predictive accuracy than those produced by the original decision list induction algorithm.Keywords: Prepending,Rule Learning
29.Webb., G. I. and J. Agar (1992). Inducing Diagnostic Rules For Glomerular Disease With The DLG Machine Learning Algorithm. Artificial Intelligence in Medicine 4(6). Elsevier, pages 419-430.[Link to paper via Science Direct] Abstract:A pilot study has applied the DLG machine learning algorithm to create expert systems for the assessment and interpretation of clinical and laboratory data in glomerular disease. Despite the limited size of the data-set and major deficiencies in the information recorded therein, promising results have been obtained. On average, 100 expert systems developed from different subsets of the database, had a diagnostic accuracy of 54.70% when applied to cases that had not been used in their development. This compares with an average diagnostic accuracy of 48.31% obtained by four expert clinicians and of 3.23% obtained by random diagnosis. The expert systems demonstrated increased accuracy (62.90% on average) when cases of diseases represented by less than twenty examples were discarded. These results suggest that database expansion may enable the induction of diagnostic rules that provide accurate non-invasive diagnosis of specific categories of glomerular disease.Keywords: Rule Learning
28.Agar, J. and G. I. Webb (1992). The Application Of Machine Learning To A Renal Biopsy Data-Base. Nephrology, Dialysis and Transplantation 7. Oxford UK: Oxford University Press, pages 472-478.[Link to paper via Oxford Journals On-line] Abstract:This pilot study has applied machine learning (artificial intelligence derived qualitative analysis procedures) to yield non-invasive techniques for the assessment and interpretation of clinical and laboratory data in glomerular disease. To evaluate the appropriateness of these techniques, they were applied to subsets of a small database of 284 case histories and the resulting procedures evaluated against the remaining cases. Over such evaluations, the following average diagnostic accuracies were obtained: microscopic polyarteritis, 95.37%; minimal lesion nephrotic syndrome, 96.50%; immunoglobulin A nephropathy, 81.26%; minor changes, 93.66%; lupus nephritis, 96.27%; focal glomerulosclerosis, 92.06%; mesangial proliferative glomerulonephritis, 92.56%; and membranous nephropathy, 92.56%. Although in general the new diagnostic system is not yet as accurate as the histological evaluation of renal biopsy specimens, it shows promise of adding a further dimension to the diagnostic process. When the machine learning techniques are applied to a larger database, greater diagnostic accuracy should be obtained. It may allow accurate non- invasive diagnosis of some cases of glomerular disease without the need for renal biopsy. This may reduce both the cost and the morbidity of the investigation of glomerular disease and may be of particular value in situations where renal biopsy is considered hazardous or contraindicated.Keywords: Rule Learning
27.Kuzmycz, M. and G. I. Webb (1992). Evaluation of Feature Based Modelling in Subtraction. In C. Frasson, G. Gauthier and G. I. McCalla (Eds.), Lecture Notes in Computer Science Vol. 608: Proceedings of the Second International Conference on Intelligent Tutoring Systems (ITS'92) Montral, Canada. Berlin: Springer-Verlag, pages 269-276.[Pre-publication PDF][Link to paper via Springerlink] Abstract:One aim of intelligent tutoring systems is to tailor lessons to each individual student's needs. To do this a tutoring system requires a model of the student's knowledge. Cognitive modelling aims to produce a detailed explanation of the student's progress. Feature Based Modelling forms a cognitive model of the student by creating aspects of problem descriptions and of students' responses. This paper will discuss Feature Based Modelling and show the results of an evaluation carried out in the domain of elemental subtraction.Keywords: Feature Based Modeling, Case Based Learning
26.Yip, S. and G. I. Webb (1992). Function Finding in Classification Learning. In Proceedings of the Second Pacific Rim International Conference on Artificial Intelligence (PRICAI '92) Seoul, Korea. Berlin: Springer-Verlag, pages 555-561.[Pre-publication PDF] Abstract:The paper describes a method for extending domain models in classification learning by deriving new attributes from existing attributes. The process starts by finding functional regularities within each class. Such regularities are then treated as additional attributes in the subsequent classification learning process. The research revealed that these techniques can reduce the number of clauses required to describe each class, enable functional regularities between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions and improve the accuracy of classifying novel instances when compared to classification learning alone.Keywords: Constructive Induction
25.Yip, S. and G. I. Webb (1992). Discriminate Attribute Finding in Classification Learning. In A. Adams and L. Sterling (Eds.), Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (AI'92) Hobart, Tas., Australia. Singapore: World Scientific, pages 374-379.[Pre-publication PDF] Abstract:This paper describes a method for extending domain models in classification learning by deriving new attributes from existing ones. The process starts by examining examples of different classes which have overlapping ranges in all of their numeric attribute values. Based on existing attributes, new attributes which enhance the distinguishability of a class are created. These additional attributes are then used in the subsequent classification learning process. The research revealed that this method can enable relationships between attributes to be incorporated in the classification procedures and, depending on the nature of data, significantly increase the coverage of class descriptions, improve the accuracy of classifying novel instances and reduce the number of clauses in class description when compared to classification learning alone. Evaluation with the data on iris flower classification showed that the classification accuracy is slightly improved and the number of clauses in the class description is significantly reduced.Keywords: Constructive Induction
24.Webb, G. I. (1992). Man-Machine Collaboration for Knowledge Acquisition. In A. Adams and L. Sterling (Eds.), Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (AI'92) Hobart, Tas., Australia. Singapore: World Scientific, pages 329-334.[Pre-publication PDF] Abstract:Both machine learning and knowledge elicitation from human experts have unique strengths and weaknesses. Man-machine collaboration for knowledge acquisition allows both knowledge acquisition techniques to be employed hand- in-hand. The strengths of each can alleviate the other's weaknesses. This has the potential to both reduce the time taken to develop an expert system while increasing the quality of the finished product. This paper discusses techniques for man-machine collaboration for knowledge acquisition and describes Einstein, a computer system that implements those techniquesKeywords: *
23.Smith, P. and G. I. Webb (1992). Recent progress in the Development of a Debugging Assistant for Computer Programs. In . B. Chia, R. Pennell and R. Sims (Eds.), A Future Promised: Proceedings of the Fifth Australian Society for Computers in Learning in Tertiary Education Conference (ASCILITE '92) Sydney, Australia., pages 351-356.[Pre-publication PDF] Abstract:We present recent progress in the development of a debugging assistant for helping novices debug their computer programs. Bradman, which is still in the implementation phase, is an interactive system which builds two models of the user's program - one reflecting what the program actually does and the other reflecting what the programmer intended to do. Conflicts between these two models are used by Bradman to find bugs in the program.Keywords: Computer Science Education
22.Webb, G.I. and J. Agar (1991). The Application of Machine Learning to the Diagnosis of Glomerular Disease. In C. Sarmeinto (Ed.), Proceedings of the IJCAI Workshop W.15: Representing Knowledge in Medical Decision Support Systems Sydney, Australia., pages 8.1-8.8.[Pre-publication PDF] Abstract:A pilot study has applied the DLG machine learning algorithm to create expert systems for the assessment and interpretation of clinical and laboratory data in glomerular disease. Despite the limited size of the data-set and major deficiencies in the information recorded therein, for one of the conditions examined in this study, microscopic polyarteritis, a consistent diagnostic accuracy of 100% was obtained. With expansion of the data base, it is possible that techniques will be derived that provide accurate non-invasive diagnosis of some cases of glomerular disease, thus obviating the need for renal biopsy. Success in this project will result in significant reductions in both the cost and the morbidity associated with the investigation of glomerular disease.Keywords: Rule Learning
21.Webb, G.I. (1991). An Attribute-Value Machine Learning Approach To Student Modelling. In J. Kay and A. Quilici (Eds.), Proceedings of the IJCAI Workshop W.4: Agent Modelling for Intelligent Interaction Sydney, Australia., pages 128-136.[Pre-publication PDF] Abstract:This paper describes an application of machine learning to student modelling. Unlike previous machine learning approaches to student modelling, the new approach is based on attribute-value machine learning. In contrast to many previous approaches it is not necessary for the lesson author to identify all forms of error that may be detected or to identify the possible approaches to problem solving in the domain that may be adopted. Rather, the lesson author need only identify the relevant attributes both of the tasks to be performed by the student and of the student's actions. The values of these attributes are automatically processed by the student modeler to produce the student model.Keywords: Feature Based Modeling, Computer Based Learning
20.Webb, G. I. (1991). Einstein: An Interactive Inductive Knowledge-Acquisition Tool. In Proceedings of the Sixth Banff Knowledge Acquisition for Knowledge-Based Systems Workshop Banff, Canada., pages (36)1-16. Abstract:Einstein is a knowledge acquisition system that incorporates data-driven inductive rule development and refinement in a user driven production rule development and evaluation environment. This allows the user and the induction system to interact as a cooperative knowledge acquisition team. Unique features of this system include efficient automated inductive refinement of existing production rules, interactive user management of machine learning facilities, including local and global guidance, interactive specification of key examples and counter-examples and interactive case-based rule assessment.Keywords: Machine Learning with Knowledge Acquisition from Experts
19.Webb, G. I. (1991). Data Driven Inductive Refinement of Production Rules. In R. Quinlan (Ed.), Proceedings of the First Australian Workshop on Knowledge Acquisition for Knowledge-Based Systems (AKAW '91) Pokolbin, NSW, Australia. Sydney: University of Sydney Press., pages 44-52.[Pre-publication PDF] Abstract:This paper presents algorithms for inductive refinement of production rules based on the DLG data-driven machine learning algorithm. These algorithms modify the input production rules with reference to a set of examples so as to ensure that all positive examples are covered and no negative examples are covered. The input production rules may either have been previously learnt by a machine learning system or be extracted from an existing expert system.Keywords: Rule Learning
18.Kuzmycz, M. and G. I. Webb (1991). Modelling Elementary Subtraction: Intelligent Warfare Against Bugs. In R. Godfrey (Ed.), Simulation & Academic Gaming in Tertiary Education, The Proceedings of the Eighth Annual Conference of ASCILITE (ASCILITE '91) Launceston, TAS, Australia. Launceston: University of Tasmania, pages 367-376.[Pre-publication PDF] Abstract:This paper discusses an intelligent system .that uses Input/ Output Cognitive Modelling (IOCM) techniques to form a model of the student. The paper describes FBM, an IOCM system that uses features to represent the inputs and outputs of the tasks being presented to the student and forms a relationship which describes in essence the knowledge the student has in the domain. Also presented is ASPMoRe, an intelligent tool that takes the model of the student and adapts the lesson to both refine the model and give the student practice in weak areas of his knowledge. Results have shown that the system can be an effective tool for educational purposes.Keywords: Feature Based Modeling, Computer Based Learning
17.Smith, P. and G. I. Webb (1991). Debugging Using Partial Models. In R. Godfrey (Ed.), Simulation & Academic Gaming in Tertiary Education, The Proceedings of the Eighth Annual Conference of ASCILITE (ASCILITE '91) Launceston, TAS, Australia. Launceston: University of Tasmania, pages 581-590.[Pre-publication PDF] Abstract:We present initial work on an expert system that will help users to debug their Pascal programs. This system asks the user questions concerning attempts to build a 'partial model' of the program - a model of those aspects of the program likely to relate to the error. This contrasts with previous systems in which a complete model of the user's program is built and compared to templates of correct versions of the program. The advantages of this approach are greater flexibility, greater student involvement in the debugging process and lower computational overheads.Keywords: Computer Science Education
16.Webb, G. I. (1991). Inside the Unification Tutor: The Architecture of an Intelligent Educational System. In R. Godfrey (Ed.), Simulation & Academic Gaming in Tertiary Education, The Proceedings of the Eighth Annual Conference of ASCILITE (ASCILITE '91). Launceston: University of Tasmania, pages 677-684.[Pre-publication PDF] Abstract:The Unification Tutor provides practice and tuition on the unification of terms from the Prolog programming language. It integrates multiple knowledge sources encompassing both performance and declarative knowledge. A key feature of the tutor is the use of a detailed student model. It has been used since 1989 in Computer Science courses at Deakin and La Trobe Universities. Previous papers have examined the student modelling component of this system. This paper investigates the internal operation of the Unification Tutor, the sub-systems it incorporates and their interaction.Keywords: Feature Based Modeling, Computer Based Learning
15.Surruwerra, L. Sanzogni,F. and G.I. Webb (1990). Improving the Efficiency of Rule Based Expert Systems by Rule Activation. Journal of Experimental and Theoretical Artificial Intelligence 2. Taylor and Francis, pages 369-380.[Pre-publication PDF][Journal of Experimental & Theoretical Artificial Intelligence] Abstract:In this paper we test a hypothesis that has shown promise in enhancing the efficiency (run-time) of rule-based systems. The results of our experiments suggest that the use of rule activation plays an active part in improving the performance of rule bases containing conflict sets.
14.Webb, G. I., G. Cumming, T. Richards, and K-K. Yum (1990). Educational Evaluation of Feature Based Modelling in a Problem Solving Domain. In R. Lewis and S. Otsuki (Eds.), Proceedings of the IFIP TC3 International Conference on Advanced Research on Computers in Education (ARCE'90) Tokyo, Japan. Amsterdam: Elsevier, pages 101-108.[Pre-publication PDF] Abstract:Feature-Based Modelling is a machine learning based cognitive modelling methodology. An intelligent educational system has been implemented, for the purpose of evaluating the methodology, which helps students learn about the unification of terms from the Prolog programming language. The system has been used by Third Year Computer Science students at La Trobe University during September 1989. Students were randomly allocated to an Experimental condition, in which FBM modelling was used to select tasks, and give extra comments, or to a Control condition in which similar tasks and comments were given, but without FBM tailoring to the individual. Ratings of task appropriateness, and comment usefulness, were collected on-line as the students worked with the tutor; overall ratings were obtained by questionnaire at the end; and semester exam results were examined. Despite the fact that only a minority of students showed sufficient misunderstanding for FBM to have potential value, of the ten comparisons chat relate most directly to the aims of the Tutor, while in no case reaching significance, seven were in favour of the Tutor, and only two against. These preliminary results are very encouraging for the FBM principles of the Tutor.Keywords: Feature Based Modeling, Computer Based Learning
13.Webb, G. I. (1990). Rule Optimisation and Theory Optimisation: Heuristic Search Strategies for Data-Driven Machine Learning. In H. Motada, R. Mizoguchi, J. Boose and B. Gaines (Eds.), Proceedings of the First Japanese Knowledge Acquisition for Knowledge-Based Systems Workshop (JKAW'90) Kyoto & Hatoyama, Japan. Tokyo: IOS Press, pages 219-232.[Pre-publication PDF] Abstract:Previous implementations of the Aq algorithm have used rule optimisation search strategies to attempt to develop optimal classification procedures. These strategies involve generating successive characteristic descriptions each of which is individually of maximal value. This is contrasted with theory optimisation search strategies which, instead, generate successive complete classification procedures from which those with the maximal value are selected. These two strategies have been applied to the domain of the diagnosis of Immunoglobulin A Nephropathy disease. The theory optimisation strategy was observed to out perform the rule optimisation strategy.Keywords: Rule Learning
12.Webb, G. I. (1989). Courseware Abstraction: Reducing Development Costs While Producing Qualitative Improvements in CAL. Journal of Computer Assisted Learning 5. Blackwell Publishing, pages 103-113.[Pre-publication PDF][Link to Blackwell Publishing and the Journal of Computer Assisted Learning] Abstract:Courseware abstraction is an approach to CAL whereby the lesson author creates a general parametetized CAL lesson that is then applied to many concrete examples. This approach has the following advantages over alternative approaches to lesson development: it is cost efficient; it facilitates lesson verification; it encourages the provision of as many examples as are desirable; it simplifies the selection of appropriate examples for presentation to each student; it provides a convenient framework for student evaluation, and it supports the development of factually exhaustive lessons. In short it provides qualitative improvements, while at the same time reducing lesson development costs. Although widely used, courseware abstraction has not previously been identified as an important CAL technique and its relative merits have never received attention. In particular, there has been a failure to recognize that generative CAL derives most of its power from the use of courseware abstraction.Keywords: Computer Based Learning
11.Webb, G. I. (1989). A Machine Learning Approach to Student Modelling. In Proceedings of the Third Australian Joint Conference on Artificial Intelligence (AI 89) Melbourne, Australia., pages 195-205.[Pre-publication PDF] Abstract:This paper describes an application of established machine learning principles to student modelling. Unlike previous machine learning based approaches to student modelling, the new approach is based on attribute-value machine learning. In contrast to many previous approaches it is not necessary for the lesson author to identify all forms of error that may be detected. Rather, the lesson author need only identify the relevant attributes both of the tasks to be performed by the student and of the student's actions. The values of these attributes are automatically processed by the student modeler to produce the student model.Keywords: Feature Based Modeling, Computer Based Learning
10.Webb, G. I., G. Cumming, T. Richards, and K-K. Yum (1989). The Unification Tutor: An Intelligent Educational System in the Classroom. In G. Bishop and J. Baker (Eds.), Proceedings of the Seventh Annual Conference of the Australian Society for Computers in Learning in Tertiary Education (ASCILITE '89) Gold Coast, QLD, Australia. Gold Coast: Bond University, pages 408-420.[Pre-publication PDF] Abstract:The Unification Tutor is experimental Intelligent Tutoring System for the domain of the unification of Prolog terms. It demonstrates the interactive use of Feature-Based Modelling - an approach to cognitive modelling that has been presented at previous ASCILITE Conferences (Webb, 1988b.) The Unification Tutor has been used by Third Year Computer Science students at La Trobe University during September 1989. This paper describes the Unification Tutor and evaluates its performance at La Trobe.Keywords: Feature Based Modeling, Computer Based Learning, Computer Science Education
9.Webb, G. I. (1988). A Knowledge-Based Approach To Computer-Aided Learning. International Journal of Man-Machine Studies 29. Academic Press, pages 257-285.[Pre-publication PDF][Now known as Int. Journal of Human-Computer Studies] Abstract:This paper describes a methodology for the creation of knowledge-based computer- aided learning lessons. Unlike previous approaches, the knowledge base is utilized only for restricted aspects of the lesson - both for the management of flow of control through a body of instructional materials and for the evaluation of the student's understanding of the subject matter. This has many advantages. While the approach has lower developmental and operational overheads than alternatives it is also able to perform far more flexible evaluations of the student's performance. As flow of control is managed by a knowledge-based component with reference to a detailed analysis of the student's understanding of the subject matter, lessons adapt to each student's individual understanding and aptitude within a domain.Keywords: Computer Based Learning
8.Richards, T., G. I. Webb, and N. Craske (1988). Object-oriented Control for Intelligent Computer Assisted Learning Systems. In P. Ercoli and R. Lewis (Eds.), Proceedings of the IFIP TC3 Working Conference on Artificial Intelligence Tools in Education Frascati, Italy. North-Holland, Amsterdam: Elsevier, pages 203-219.[Pre-publication PDF] Abstract:This paper investigates an approach to providing a general-purpose authoring/tutoring shell for intelligent computer assisted learning systems. The approach is to outline an object-oriented representation of task/goal hierarchies, then to consider the ways in which domain expertise, student information and teacher expertise can be made to interact with such hierarchies. The result is a skeleton in terms of which exploratory lessons are being constructed; and in terms of which further research on the domain expert system, student modelling, educational expertise modelling, and user interfacing can more concretely be developed.Keywords: Computer Based Learning
7.Webb, G. I. (1988). Techniques for Efficient Empirical Induction. In C. J. Barter and M. J. Brooks (Eds.), Lecture Notes in Artificial Intelligence Vol. 406: Proceedings of the Second Australian Joint Conference on Artificial Intelligence (AI'88) Adelaide, S.A., Australia. Berlin: Springer-Verlag, pages 225-239.[Pre-publication PDF][Link to paper via Springerlink] Abstract:This paper describes the LEI algorithm for empirical induction. The LEI algorithm provides efficient empirical induction for discrete attribute value data. It derives a classification procedure in the form of a set of predicate logic classification rules. This contrasts with the only other efficient approach to exhaustive empirical induction, the derivatives of the CLS algorithm, which present their classification procedures in the form of a decision tree. The LEI algorithm will always find the simplest non-disjunctive rule that correctly classifies all examples of a single class where such a rule exists.Keywords: Rule Learning
6.Webb, G. I. (1988). Cognitive Diagnosis Using Student Attributions. In K. Fielden, F. Hicks and N. Scott (Eds.), Computers in Learning in Tertiary education: Proceedings of the Sixth Annual Conference of the Australian Society for Computers in Learning in Tertiary Education (ASCILITE-88) Canberra, Australia., pages 502-514.[Pre-publication PDF] Abstract:This paper details an approach to cognitive diagnosis that enables the inference of detailed models of a student's conceptualisation of a domain. This model is constructed be examining the attributes of the problems that the student has tackled and the student's performance while tackling those problems. A feature network is used to represent educationally relevant domain knowledge. This approach has low implementation and operational overheads; It provides a detailed model of the student's conceptualisation of the subject domain in terms of elements of knowledge from that domain; Student models are not restricted to overlays of predefined correct and/or incorrect knowledge; It does not require that the instructional designer anticipate the possible forms of error that may occur; It is robust in the face of partial evaluation of student performance; It is also robust in the face of the instructional designer's failure to incorporate relevant aspects of the subject domain in the knowledge-base; The student models can be executed; It supports accurate diagnosis of multiple viewpoints of the domain even when those viewpoints are not anticipated by the instructional designer; It can support multiple teaching styles in the one lesson.Keywords: Feature Based Modeling, Computer Based Learning
5.Webb, G. I. (1987). Domain and Tutoring Knowledge in Computer-Aided Learning. In J. Gero and F. Sudweeks (Eds.), Proceedings of the First Australian Joint Conference on Artificial Intelligence (AI'87) Sydney, Australia. Sydney: The University of Sydney Printing Service, pages 488-502.[Pre-publication PDF] Abstract:Previous approaches to the utilisation of expertise in computer-aided learning have emphasised expertise in the subject domain. By contrast, this paper details an approach that emphasises tutoring expertise and only relies on minimal domain expertise. This has several advantages. The intelligent use of restricted domain expertise enables the detailed evaluation of the students' understanding of the domain. This permits the provision of very flexible tuition that uniquely adjusts to each student's understanding of the domain. Further, due to the restricted nature of the domain knowledge that is required, the developmental overheads associated with a lesson are minimal. Finally, the type of domain knowledge required has a well defined semantics further enabling its intelligent manipulation. The approach described is domain independent. This paper describes the general system architecture, the knowledge representation formalism used and the tutoring strategies that are employed.Keywords: Computer Based Learning
4.Webb, G. I. (1987). Generative CAL and Courseware Abstraction. In J. Barrett and J. Hedberg (Eds.), Using computers intelligently in Tertiary Education: Proceedings of the Fifth Annual Conference of the Australian Society for Computers in Learning in Tertiary Education (ASCILITE-87) Sydney, Australia., pages 257-285.[Pre-publication PDF] Abstract:Courseware abstraction is an approach to CAL whereby the lesson author creates a general parameterised CAL lesson that is then applied to many concrete examples. This approach has the following advantages: 1. it provides a powerful framework within which to adapt tuition to a student's knowledge and aptitude; 2. it encourages the development of detailed treatments of the subject matter; it reduces the cost of lesson development as a ratio to student lesson time; 3. and it enables large numbers of examples to be made available for individual students. Generative CAL is an example of courseware abstraction. It is argued that the advantages of generative CAL do not arise directly from the generation of the examples to be examined but rather can be directly attributed to the use of courseware abstraction.Keywords: Computer Based Learning
3.Webb, G.I. (1986). Knowledge Based Flow of Control in Computer-Aided Learning. In Proceedings of the First Australian Artificial Intelligence Congress (1AAIC'86) Melbourne, Australia., pages B: 1-7.[Pre-publication PDF] Abstract:In this paper I examine the utilisation of knowledge representation in Computer-Aided Learning (CAL) with the aim of establishing knowledge-based CAL techniques that are best suited to current technology. Most existing knowledge-based CAL systems attempt to generate the entire instructional sequence directly from a domain knowledge base. Such systems suffer from several limitations. These limitations include: 1. It is questionable whether the techniques exist to produce such systems for any but a highly restricted set of domains. 2. Even for those domains in which such systems can be produced the overheads are prohibitive for most purposes. Given these limitations, I argue that knowledge representation should be utilised in CAL only for those aspects of the instructional process for which it results in substantial gains without prohibitive overheads. I demonstrate that one aspect of CAL for which this holds is for managing flow of control within instructional material. I provide a detailed description of feature networks. These are a variant of M.A.K. Halliday's system network formalism. Feature networks are a knowledge representation formalism that efficiently encodes exactly the knowledge that is required for knowledge-based flow of control. It is shown that computer based lessons that utilise feature networks for control flow of control are extremely economic in terms of both authoring time and computer resources while providing highly responsive tuition. DABIS, a system that embodies the methodology outlined above, has been implemented and is described. DABIS demonstrates the feasibility of this methodology. There are no fundamental restrictions to the domains to which it can be applied. A lesson created under the DABIS system has been found to have an authoring to student time ration of approximately 12:1. Lessons created by the system also demonstrate the sensitivity to a student's knowledge and abilities within a domain that results from the intelligent use of knowledge-based CAL.
2.Webb, G. I. (1986). The Domain-Analysis Based Instruction System. In G. Bishop and W. vanLint (Eds.), Proceedings of the Fourth Annual Computer-Assisted Learning in Tertiary Education Conference (CALITE'86) Adelaide, Australia. Adelaide: University of Adelaide, pages 295-302.[Pre-publication PDF] Abstract:At the past two CALITE conferences I have described a methodology for creating knowledge-based CAL. This paper describes how that methodology has evolved. The Domain-Analysis Based Instruction System, a CAL system that utilises the methodology is then described in detail.Keywords: Computer Based Learning
1.Richards, T. and G. I. Webb (1985). ECCLES An Expert System for CAL. In H. Garrett (Ed.), Proceedings of the Tenth Western Educational Computing Conference (WECC'85) Oakland, CA. North Hollywood, CA: : Western Periodicals Company, pages 151-157.[Pre-publication PDF] Abstract:An authoring and lesson management system is described for Computer Assisted Learning in which lesson questioning and control flow arising from student response are generated at lesson time from an internal model of the lesson topic area. This approach permits the rapid authoring of conceptually complex lesson material. A languageless menu-driven authoring system rninimises the system familiarization time for lesson authors and enforces the construction of logically complete lessons. From the student's point of view, the system appears as an expert system in the lesson subject area, with precise and detailed knowledge of the topic taught and of the causes of the student's own errors.Keywords: Computer Based Learning