Data
Mining and Data warehousing
Two marks questions with answers
Unit III
1)
Define support in association rule mining
The rule A
=> B holds in the transaction set D with support s where s is the percentage
of transactions in D that contain A U B i.e., both A & B. This is taken to
be the probability, P (A U B).
2)
Define confidence.
The rule A
=> B has confidence c in the transaction set D if c is the percentage of
transactions in D containing A that also contains B. This is taken to be the
Conditional Probability P (B|A).
3)
Define occurrence frequency of an item set.
A set of items
is referred to as an item set. The occurrence frequency of an item set is the
number of transactions that contain the item set.
4)
How is association rules mined from large databases?
Association
rule mining is a two step process.
Find all
frequent item sets
Generate strong
association rules from the frequent item sets
5)
When an item set satisfies minimum support?
An item set
satisfies minimum support if the occurrence frequency of the item set is
greater than or equal to the product of min_sup and the total number of transactions
in D.
6)
Define minimum support count.
The number of
transactions required for the item set to satisfy minimum support is therefore
referred to as minimum support count. If an item set satisfies minimum support
then it is a frequent item set.
7)
Give the classification of association rules.
Based on the
types of values handled in the rule
Based on the
dimensions of data involved in the rule.
Based on the
levels of abstractions involved in the rule set.
Based on
various extensions to association mining.
8)
Define Frequent Closed Item Set.
Frequent Closed
Item Set is a frequent closed item set where an item set c is closed if the
there exists no proper superset of c, c’ such that every transaction containing
c also contains c’.
9)
Define Apriori property.
If an item set
I does not satisfy the minimum support threshold, min_sup then I is not
frequent i.e., P (I) < min_sup. If an item A is added to the item set I then
the resulting item set I U A cannot occur more frequently than I. Therefore I U
A is not frequent either i.e., P (I U A) < min_sup.
10) Define Anti-Monotone property.
If a set cannot
pass a test or, all of its supersets will fail the same test as well. It is
called anti-monotone because property is monotonic in the context of failing a
test.
11) List the two step process involved in Apriori
algorithm.
Join Step
Prune Step
12) List the search strategy for mining multi
level associations with reduced support.
Level by level
independent
Level cross
filtering by single item
Level cross
filtering by K item set.
13) Compare Level by level independent and level
cross filtering by K item set strategy.
Level by level
independent strategy lead to examining numerous infrequent at low levels
finding association between items of little importance.
Level cross
filtering by K item set strategy allows the mining system to examine only the
children of frequent K item sets. This restriction is very strong in that,
there usually are not K item sets that when combined are also frequent. Hence
many valuable patterns may be filtered out using this approach.
14) Define single dimensional association rule.
Buys(X, “IBM
desktop computer”) => buys(X, “Sony b/w printer”)
The above rule
is said to be single dimensional rule since it contains a single distinct
predicate (eg buys) with multiple occurrences (i.e., the predicate occurs more
than once within the rule. It is also known as intra dimension association
rule.
15) Define multi dimensional association rules.
Association
rules that involve two or more dimensions or predicates can be referred to as
multi dimensional associational rules.
Age(X, “20…29”)
^ occupation (X, “Student”) => buys (X,”Laptop”)
The above rule
contains three predicates (age, occupation, buys) each of which occurs only
once in the rule. There are no repeated predicates in the above rule. Multi
dimensional association rules with no repeated predicates are called
interdimension association rules.
16) Define categorical attribute.
Categorical
attributes have finite number of possible values with no ordering among the
values (eg. Occupation, brand). Categorical attributes are also called nominal
attributes since there values are “Names of Things”.
17) Define Quantitative Attributes.
Quantitative
attributes are numeric and have an implicit ordering among values (eg age,
income, and price).
No comments:
Post a Comment