1.
Define Association Rule Mining.
Association rule mining searches for interesting
relationships among items in
a
given data set.
2.
When we can say the association rules are interesting?
Association rules are considered interesting if they
satisfy both a minimum
support
threshold and a minimum confidence threshold. Users or domain experts
can
set such thresholds.
3.
Explain Association rule in mathematical notations.
Let
I-{i1,i2,…..,im} be a set of items
Let
D, the task relevant data be a set of database transaction T is a set of
items
An
association rule is an implication of the form A=>B where A C I, B C I,
and
An B=f. The rule A=>B contains in the transaction set D with support s,
where
s is the percentage of transactions in D that contain AUB. The Rule A=> B
has
confidence c in the transaction set D if c is the percentage of transactions in
D
containing
A that also contain B.
4.
Define support and confidence in Association rule mining.
Support
S is the percentage of transactions in D that contain AUB.
Confidence
c is the percentage of transactions in D containing A that also contain
B.
Support
( A=>B)= P(AUB)
Confidence
(A=>B)=P(B/A)
5.
How are association rules mined from large databases?
• I step: Find all frequent item sets:
• II step: Generate strong association rules from frequent
item sets
6.
Describe the different classifications of Association rule mining.
• Based on types of values handled in the Rule
i. Boolean association rule
ii. Quantitative association rule
• Based on the dimensions of data involved
i. Single dimensional association rule
ii. Multidimensional association rule
• Based on the levels of abstraction involved
i. Multilevel association rule
ii. Single level association rule
• Based on various extensions
i. Correlation analysis
ii. Mining max patterns
7.
What is the purpose of Apriori Algorithm?
Apriori algorithm is an influential algorithm for mining
frequent item sets for
Boolean
association rules. The name of the algorithm is based on the fact that the
algorithm
uses prior knowledge of frequent item set properties.
8.
Define anti-monotone property.
If a set cannot pass a test, all of its supersets will
fail the same test as well.
9.
How to generate association rules from frequent item sets?
Association rules can be generated as follows
For
each frequent item set1, generate all non empty subsets of 1.
For
every non empty subsets s of 1, output the rule “S=>(1-s)”if
Support
count(1)
=min_conf,
Support_count(s)
Where
min_conf is the minimum confidence threshold.
10.
Give few techniques to improve the efficiency of Apriori algorithm.
• Hash based technique
• Transaction Reduction
• Portioning
• Sampling
• Dynamic item counting
11.
What are the things suffering the performance of Apriori candidate
generation
technique.
• Need to generate a huge number of candidate sets
• Need to repeatedly scan the scan the database and check
a large set of
candidates by pattern matching
12.
Describe the method of generating frequent item sets without candidate
generation.
Frequent-pattern growth(or FP Growth) adopts
divide-and-conquer
strategy.
Steps:
Compress
the database representing frequent items into a frequent pattern tree
or FP
tree
Divide
the compressed database into a set of conditional database
Mine
each conditional database separately
13.
Define Iceberg query.
It computes an aggregate function over an attribute or set
of attributes in
order
to find aggregate values above some specified threshold.
Given
relation R with attributes a1,a2,…..,an and b, and an aggregate function,
agg_f,
an iceberg query is the form
Select
R.a1,R.a2,…..R.an,agg_f(R,b)
From
relation R
Group
by R.a1,R.a2,….,R.an
Having
agg_f(R.b)>=threshold
14.
Mention few approaches to mining Multilevel Association Rules
• Uniform minimum support for all levels(or uniform
support)
• Using reduced minimum support at lower levels(or reduced
support)
• Level-by-level independent
• Level-cross filtering by single item
• Level-cross filtering by k-item set
15.
What are multidimensional association rules?
Association rules that involve two or more dimensions or
predicates
•
Interdimension association rule: Multidimensional association rule with no
repeated
predicate or dimension
•
Hybrid-dimension association rule: Multidimensional association rule with
multiple
occurrences of some predicates or dimensions.
16.
Define constraint-Based Association Mining.
Mining
is performed under the guidance of various kinds of constraints
provided
by the user.
The
constraints include the following
•
Knowledge type constraints
•
Data constraints
•
Dimension/level constraints
•
Interestingness constraints
•
Rule constraints.
17.
Define the concept of classification.
Two
step process
• A
model is built describing a predefined set of data classes or concepts.
The
model is constructed by analyzing database tuples described by
attributes.
• The
model is used for classification.
18.
What is Decision tree?
A
decision tree is a flow chart like tree structures, where each internal
node
denotes a test on an attribute, each branch represents an outcome of the test,
and
leaf nodes represent classes or class distributions. The top most in a tree is
the
root
node.
19.
What is Attribute Selection Measure?
The information Gain measure is used to select the test
attribute at each node
in
the decision tree. Such a measure is referred to as an attribute selection
measure
or a
measure of the goodness of split.
20.
Describe Tree pruning methods.
When a decision tree is built, many of the branches will
reflect anomalies in
the
training data due to noise or outlier. Tree pruning methods address this
problem
of over fitting the data.
Approaches:
• Pre pruning
• Post pruning
21.
Define Pre Pruning
A
tree is pruned by halting its construction early. Upon halting, the node
becomes
a leaf. The leaf may hold the most frequent class among the subset
samples.
22.
Define Post Pruning.
Post
pruning removes branches from a “Fully grown” tree. A tree node is
pruned
by removing its branches.
Eg:
Cost Complexity Algorithm
23.
What is meant by Pattern?
Pattern
represents the knowledge.
24.
Define the concept of prediction.
Prediction
can be viewed as the construction and use of a model to assess the
class
of an unlabeled sample or to assess the value or value ranges of an attribute
that
a given sample is likely to have.
No comments:
Post a Comment