Phishing Website Detection Using Machine Learning in Weka | Seelio
Skip to Main Content

This project used data collected from PhishTank to create machine learning models that would allow users to classify websites as either phishing or legitimate websites. This would protect users by alerting users before visiting a website that will put them at risk. Logistic regression, Naive Bayes, k-nearest neighbors, decision trees, and random forest were run and attribute subset selection was performed to further improve performance.

Naive Bayes had the best prediction accuracy (86.6%), but logistic regression and random forest surpassed Naive Bayes in performance when the number of predictors were decreased using subset selection. Logistic regression had the highest prediction accuracy of 89.9%, followed by random forest at 89.3%. With more people visiting websites every day, the importance of the ability of antivirus and firewall software to safeguard users from malicious websites is growing.

  • Attachment
  • Attachment
    Prediction accuracy of different algorithms after subset selection.



  • Denis Nguyen

    Denis Nguyen / Student

    Rutgers University-New Brunswick
    This was a two-person project in which I contributed to all aspects of this project, from data preparation to analysis.


Computer Science

Last Updated At