Blogger Widgets

Total Page visits

Sunday, July 14, 2013

DATA WAREHOUSING AND DATA MINING,2Mark,Unit I



1.Define Data mining.
It refers to extracting or “mining” knowledge from large amount of data. Data
mining is a process of discovering interesting knowledge from large amounts of data
stored either, in database, data warehouse, or other information repositories

2.Give some alternative terms for data mining.
• Knowledge mining
• Knowledge extraction
• Data/pattern analysis.
• Data Archaeology
• Data dredging

3.What is KDD.
KDD-Knowledge Discovery in Databases.

4.What are the steps involved in KDD process.
• Data cleaning
• Data Mining
• Pattern Evaluation
• Knowledge Presentation
• Data Integration
• Data Selection
• Data Transformation

5.What is the use of the knowledge base?
Knowledge base is domain knowledge that is used to guide search or evaluate the
interestingness of resulting pattern. Such knowledge can include concept hierarchies used
to organize attribute /attribute values in to different levels of abstraction of Data Mining.

6.Arcitecture of a typical data mining system.
Knowledge base

7.Mention some of the data mining techniques.
• Statistics
• Machine learning
• Decision Tree
• Hidden markov models
• Artificial Intelligence
• Genetic Algorithm
• Meta learning

8.Give few statistical techniques.
• Point Estimation
• Data Summarization
• Bayesian Techniques
• Testing Hypothesis
• Correlation
• Regression

9.What is meta learning.
Concept of combining the predictions made from multiple models of data
mining and analyzing those predictions to formulate a new and previously unknown
prediction.
·         GUI
·         Pattern Evaluation
·         Database or Data warehouse
·         server
·         DB DW

10.Define Genetic algorithm.
• Search algorithm.
• Enables us to locate optimal binary string by processing an initial
random population of binary strings by performing operations such as
artificial mutation , crossover and selection.

11.What is the purpose of Data mining Technique?
It provides a way to use various data mining tasks.

12.Define Predictive model.
It is used to predict the values of data by making use of known results from a
different set of sample data.

13.Data mining tasks that are belongs to predictive model
• Classification
• Regression
• Time series analysis

14.Define descriptive model

• It is used to determine the patterns and relationships in a sample data. Data
mining tasks that belongs to descriptive model:
• Clustering
• Summarization
• Association rules
• Sequence discovery
 
15. Define the term summarization
The summarization of a large chunk of data contained in a web page or a
document.
Summarization = caharcterization=generalization

16. List out the advanced database systems.
• Extended-relational databases
• Object-oriented databases
• Deductive databases
• Spatial databases
• Temporal databases
• Multimedia databases
• Active databases
• Scientific databases
• Knowledge databases

17. Define cluster analysis
Cluster analyses data objects without consulting a known class label. The class
labels are not present in the training data simply because they are not known to begin
with.

18.Classifications of Data mining systems.
• Based on the kinds of databases mined:
o According to model
_ Relational mining system
_ Transactional mining system
_ Object-oriented mining system
_ Object-Relational mining system
_ Data warehouse mining system
o Types of Data
_ Spatial data mining system
_ Time series data mining system
_ Text data mining system
_ Multimedia data mining system
• Based on kinds of Knowledge mined
o According to functionalities
_ Characterization
_ Discrimination
_ Association
_ Classification
_ Clustering
_ Outlier analysis
_ Evolution analysis
o According to levels of abstraction of the knowledge mined
_ Generalized knowledge (High level of abstraction)
_ Primitive-level knowledge (Raw data level)
o According to mine data regularities versus mine data irregularities
• Based on kinds of techniques utilized
o According to user interaction
_ Autonomous systems
_ Interactive exploratory system
_ Query-driven systems
o According to methods of data analysis
_ Database-oriented
_ Data warehouse-oriented
_ Machine learning
_ Statistics
_ Visualization
_ Pattern recognition
_ Neural networks
• Based on applications adopted
o Finance
o Telecommunication
o DNA
o Stock markets
o E-mail and so on

19.Describe challenges to data mining regarding data mining methodology and user
interaction issues.
• Mining different kinds of knowledge in databases
• Interactive mining of knowledge at multiple levels of abstraction
• Incorporation of background knowledge
• Data mining query languages and ad hoc data mining
• Presentation and visualization of data mining results
• Handling noisy or incomplete data
• Pattern evaluation

20.Describe challenges to data mining regarding performance issues.
• Efficiency and scalability of data mining algorithms
• Parallel, distributed, and incremental mining algorithms

21.Describe issues relating to the diversity of database types.
• Handling of relational and complex types of data
• Mining information from heterogeneous databases and global information
Systems

22.What is meant by pattern?
Pattern represents knowledge if it is easily understood by humans; valid on test
data with some degree of certainty; and potentially useful, novel,or validates a hunch
about which the used was curious. Measures of pattern interestingness, either objective or
subjective, can be used to guide the discovery process.

23.How is a data warehouse different from a database?
Data warehouse is a repository of multiple heterogeneous data sources, organized
under a unified schema at a single site in order to facilitate management decision-making.
Database consists of a collection of interrelated data.

No comments: