Data
Mining and Data warehousing
Unit I
Two marks questions with answers
- What is Data mining?
Data mining
refers to extracting or “mining” knowledge from large amount of data. It is considered
as a synonym for another popularly used term Knowledge Discovery in Databases
or KDD.
- Give the steps involved in KDD.
KDD consists of
the iterative sequence of the following steps:
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern Evaluation
Knowledge Presentation
- Give the architecture of a typical data mining system.
The
architecture of a typical data mining system consists of the following
components:
Database, data warehouse, or other information
repository
Database or data warehouse server
Knowledge base
Data mining engine
Pattern Evaluation module.
Graphical user interface.
- Define Database management system.
A database
system also called database management system consists of a collection of
interrelated data known as a database and a set of software programs to manage
and access the data.
- Define relational database.
Relational
database is a collection of tables each of which is assigned a unique name.
Each table consists of a set of attributes (columns or fields) and usually
stores a large set of tuples (record or rows).
- Define data warehouse
Data warehouse
is a repository of information collected from multiple sources stored under a
unified schema and which usually resides at a single site. It is constructed
via a process of data cleaning, data transformation, data integration, data
loading and periodic data refreshing.
- Define data mart and compare it with data warehouse.
Data mart is a
department subset of a data warehouse. It focuses on selected subjects and thus
its scope is department wide. On the other hand data warehouse collects
information about subjects that span an entire organization and thus its scope
is department wide.
- Define transaction databases.
A transaction
database consists of a file where each record represents a transaction. A
transaction typically includes a unique transaction identity number and a list
of items making up the transaction.
9.
Explain object oriented databases.
Object oriented
databases are based on object-oriented programming paradigm where each entity
is considered as an object. Each object has e associated with it the following:
A set of variables
A set of messages
A set of methods.
- Explain spatial databases.
Spatial
databases contain spatial-related information. Such databases include
geographic databases, VLSI chip design databases, medical and satellite image
databases. Spatial data are represented in raster format consisting of
n-dimensional bit maps or pixel maps. Maps are represented in vector format
where roads, bridges are represented as a union of basic geometric constructs
such as points, lines, polygons etc.
- Explain temporal and time-series databases.
A temporal
database usually stores relational data that include time-related attributes.
These attributes may involve several timestamps each having different
semantics.A time-series database stores sequence of values that change with
time such as data collected regarding the stock exchange.
- Explain text databases and multimedia databases.
Text databases
are databases that contain word description for objects. These descriptions are
long sentences or paragraphs such as product specifications, error or bug
reports etc.Multimedia databases store image, audio, and video data. They are
used in applications such as picture content based retrieval, voice mail systems,
www, etc.
- Define legacy databases.
A legacy
database is a group of heterogenous databases that combines different kinds of
data systems such as relational or objects oriented databases, hierarchical databases,
or file systems.
- Give the classification of Data Mining tasks
Descriptive – Characterizes
the general property of the data in the database.
Predictive –
perform inference on the current data in order to make predictions.
- Describe class/concept description.
Data can be
associated with classes or concepts. The individual classes can be described in
summarized, concise, and yet precise terms. Such descriptions of a class or a
concept are called class/concept descriptions. These descriptions can be
derived via data characterization or data discrimination.
- Define data characterization.
It is a
summarization of the general characteristics or feature of a target class of
data. The data corresponding to the user-specified class are typically
collected by a database query.
- Give the output forms of data characterization.
Pie charts, bar
charts, curves, multidimensional data cubes and multidimensional tables
including cross tabs. The resulting descriptions can also be presented as
generalized relations or in rule form called characteristic rule.
- Define data discrimination.
It is a
comparison of the general features of target data objects with the general
features of objects from one or a set of contrasting classes. The target and
contrasting classes are specified by the user and the corresponding data
objects retrieved through database queries.
- What is an association analysis?
Association
analysis is the discovery of association rules showing attribute-value
conditions that occur frequently together in a given set of data. It is widely
used for market basket or transaction data analysis.
- Define Classification.
It is the
process of finding set of models that describe and distinguish data classes or
concepts for the purpose of being able to use the model to predict the class
objects whose class label is unknown. The derived model is based on the
analysis of a set of training data.
No comments:
Post a Comment