Friday, April 23, 2010

Top 10 Algorithms in Data Mining



<br />



The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative publication reference. The list was voted on by other IEEE and ACM award winners to narrow this down to a top 10 list. These algorithms are used for association analysis, classification, clustering, statistical learning, and much more.You can read the paper here.



Here are the winners:



  1. C4.5


  2. The k-Means algorithm


  3. Support Vector Machines


  4. The Apriori algorithm


  5. Expectation-Maximization


  6. PageRank


  7. AdaBoost


  8. k-Nearest Neighbor Classification


  9. Naive Bayes


  10. CART (Classification and Regression Trees)


The paper gives a brief overview of what the method is commonly used for and how it works, along with lots of references. It also has a much more detailed description of how these winners were selected than what I've said here.



The exciting thing is I've seen nearly all of these algorithms used for mining genetic data for complex patterns of genetic and environmental exposures that influence complex disease. See some recent papers at EvoBio and PSB. Further, lots of these methods are implemented in several R packages.



Top 10 Algorithms in Data Mining (PDF)




Reference: http://gettinggeneticsdone.blogspot.com/2010/04/top-10-algorithms-in-data-mining.html

No comments:

Post a Comment