This paper is published in Volume 3, Issue 5, 2018
Area
Big Data
Author
Lakshminarayanan
Org/Univ
Anna University, Chennai, Tamil Nadu, India
Pub. Date
28 May, 2018
Paper ID
V3I5-1203
Publisher
Keywords
Frequent pattern mining, Big data, Pruning, Support count, Confidence score, Map Reduce, Hadoop

Citationsacebook

IEEE
Lakshminarayanan. Frequent pattern mining on big data using Apriori algorithm, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARnD.com.

APA
Lakshminarayanan (2018). Frequent pattern mining on big data using Apriori algorithm. International Journal of Advance Research, Ideas and Innovations in Technology, 3(5) www.IJARnD.com.

MLA
Lakshminarayanan. "Frequent pattern mining on big data using Apriori algorithm." International Journal of Advance Research, Ideas and Innovations in Technology 3.5 (2018). www.IJARnD.com.

Abstract

Frequent Pattern Mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. The existing pattern mining algorithms are based on homogeneity and regularity of data. With the dramatic increase on the scale of datasets collected and stored with cloud services in recent years, it takes more computation power for mining process in the cloud. Amount of work also transferred the approximate mining computation into the exact computation, where such methods not improve the accuracy also not enhance the efficiency. The proposed algorithm uses Hadoop distributed file server for frequent pattern mining. The Hadoop distributed file server improves the performance of the system. The Iterative apriori algorithm can be used to extract the frequent pattern from the dataset. In this approach, candidate itemsets are extracted from the initial dataset. The candidate itemsets are generated from the previous iteration. The support count is calculated for each candidate itemset. The support value is the frequency of items. The confidence value should be calculated for finding the dependency between itemsets. The threshold value is calculated and based on this value pruning is performed.
Paper PDF