My View On Computers and World: DWDM

Showing posts with label DWDM. Show all posts

Sunday, July 14, 2013

Data Warehousing and Data Mining,16 Mark Questions with Hints

UNIT-I

1. Explain the evolution of Database technology?

_ Data collection and Database creation

_ Database management systems

_ Advanced database systems

_ Data warehousing and Data Mining

_ Web-based Database systems

_ New generation of Integrated information systems

2.Explain the steps of knowledge discovery in databases?

_ Data cleaning

_ Data integration

_ Data selection

_ Data transformation

_ Data mining

_ Pattern evaluation

_ Knowledge presentation

3. Explain the architecture of data mining system?

_ Database, datawarehouse, or other information repository

_ Database or data warehouse server

_ Knowledge base

_ Data mining engine

_ Pattern evaluation module

_ Graphical user interface

4.Explain various tasks in data mining?

(Or)

Explain the taxonomy of data mining tasks?

_ Predictive modeling

• Classification

• Regression

• Time series analysis

_ Descriptive modeling

• Clustering

• Summarization

• Association rules

• Sequence discovery

5.Explain various techniques in data mining?

_ Statistics (or) Statistical perspectives

_ Point estimation

• Data summarization

• Bayesian techniques

• Hypothesis testing

• Correlation

_ Regression

_ Machine learning

_ Decision trees

_ Hidden markov models

_ Artificial neural networks

_ Genetic algorithms

_ Meta learning

UNIT-II

6.Explain the issues regarding classification and prediction?

_ Preparing the data for classification and prediction

o Data cleaning

o Relevance analysis

o Data transformation

_ Comparing classification methods

o Predictive accuracy

o Speed

o Robustness

o Scalability

o Interpretability

7.Explain classification by Decision tree induction?

_ Decision tree induction

_ Attribute selection measure.

_ Tree pruning

_ Extracting classification rules from decision trees

8.Write short notes on patterns?

_ Pattern definition

_ Objective measures

_ Subjective measures

_ Can a data mining system generate all of the interesting

patterns?

_ Can a data mining system generate only interesting

patterns?

9.Explain mining single –dimensional Boolean associated rules from transactional

databases?

_ The apriori algorithm: Finding frequent itemsets using

candidate generation

_ Mining frequent item sets without candidate generation

10.Explain apriori algorithm?

_ Apriori property

_ Join steps

_ Prune step

_ Example

_ Algorithm

11.Explain how the efficiency of apriori is improved?

_ Hash-based technique (hashing item set counts)

_ Transaction reduction (reducing the number of transactions

scanned in future iteration)

_ Partitioning (Partitioning the data to find candidate item sets)

_ Sampling (mining on a subset of the given data)

_ Dynamic item set counting (adding candidate item sets at

different points during a scan)

12.Explain frequent item set without candidate without candidate generation?

_ Frequent patterns growth (or) FP-growth

_ Frequent pattern tree (or) FP-tree

_ Algorithm

13. Explain mining Multi-dimensional Boolean association rules from transaction

databases?

_ Multi-dimensional (or) Multilevel association rules

_ Approaches to mining Multilevel association rules

• Using uniform minimum support for all levels

• Using reduced minimum support at lower levels

o Level-by-level independent

o Level-cross filtering by single

o Level- cross filtering by k-item set

_ Checking for redundant Multilevel association rules

14.Explain constraint-based association mining?

_ Knowledge type constraints

_ Data constraints

_ Dimension/level constraints

_ Interestingness constraints

_ Rule constraints

_ Metarule-Guided mining of association of

association rules

_ Mining guided by additional rule constraints

Unit –III

15.Explain regression in predictive modeling?

_ Regression definition

_ Linear regression

_ Multiple regression

_ Non-linear regression

_ Other regression models

16.Explain statistical perspective in data mining?

_ Point estimation

_ Data summarization

_ Bayesian techniques

_ Hypothesis testing

_ Regression

_ Correlation

17. Explain Bayesian classification.

_ Bayesian theorem

_ Naïve Bayesian classification

_ Bayesian belief networks

_ Bayesian learning

18. Discuss the requirements of clustering in data mining.

_ Scalability

_ Ability to deal with different types of attributes

_ Discovery of clusters with arbitrary shape

_ Minimal requirements for domain knowledge to determine

input parameters

_ Ability to deal with noisy data

_ Insensitivity to the order of input records

_ High dimensionality

_ Interpretability and usability

_ Interval scaled variables

_ Binary variables

o Symmetric binary variables

o Asymmetric binary variables

_ Nominal variables

_ Ordinal variables

_ Ratio-scaled variables

20. Explain the partitioning method of clustering.

K-means clustering

K-medoids clustering

21. Explain Visualization in data mining.

Various forms of visualizing the discovered patterns

_ Rules

_ Table

_ Crosstab

_ Pie chart

_ Bar chart

_ Decision tree

_ Data cube

_ Histogram

_ Quantile plots

_ q-q plots

_ Scatter plots

_ Loess curves

UNIT IV

22. Discuss the components of data warehouse.

_ Subject-oriented

_ Integrated

_ Time-Variant

_ Non-volatile

23. List out the differences between OLTP and OLAP.

_ Users and system orientation

_ Data contents

_ Database design

_ View

_ Access patterns

24.Discuss the various schematic representations in multidimensional model.

_ Star schema

_ Snow flake schema

_ Fact constellation schema

25. Explain the OLAP operations I multidimensional model.

_ Roll-up

_ Drill-down

_ Slice and dice

_ Pivot or rotate

26. Explain the design and construction of a data warehouse.

_ Design of a data warehouse

• Top-down view

• Data source view

• Data warehouse view

• Business query view

_ Process of data warehouse design

27.Expalin the three-tier data warehouse architecture.

_ Warehouse database server(Bottom tier)

_ OLAP server(middle tier)

_ Client(top tier)

28. Explain indexing.

_ Definition

_ B-Tree indexing

_ Bit-map indexing

_ Join indexing

29.Write notes on metadata repository.

_ Definition

_ Structure of the data warehouse

_ Operational metadata

_ Algorithms used for summarization

_ Mapping from operational environment to data warehouse

_ Data related to system performance

_ Business metadata

30. Write short notes on VLDB.

_ Definition

_ Challenge related to database technologies

_ Issues in VLDB

UNIT V

31.Explain data mining applications for Biomedical and DNA data analysis.

_ Semantic integration of heterogeneous, distributed genome databases

_ Similarity search and comparison among DNA sequences

_ Association analysis.

_ Path analysis

_ Visualization tools and genetic data analysis.

32. Explain data mining applications fro financial data analysis.

_ Loan payment prediction and customer credit policy analysis.

_ Classification and clustering of customers fro targeted marketing.

_ Detection of money laundering and other financial crimes.

33. Explain data mining applications for retail industry.

_ Multidimensional analysis of sales, customers, products, time and region.

_ Analysis of the effectiveness of sales campaigns.

_ Customer retention-analysis of customer loyalty.

_ Purchase recommendation and cross-reference of items.

34. Explain data mining applications for Telecommunication industry.

_ Multidimensional analysis of telecommunication data.

_ Fraudulent pattern analysis and the identification of unusual patterns.

_ Multidimensional association and sequential pattern analysis

_ Use of visualization tools in telecommunication data analysis.

35. Explain DBMiner tool in data mining.

_ System architecture

_ Input and Output

_ Data mining tasks supported by the system

_ Support of task and method selection

_ Support of the KDD process

_ Main applications

_ Current status

36. Explain how data mining is used in health care analysis.

_ Health care data mining and its aims

_ Health care data mining technique

_ Segmenting patients into groups

_ Identifying patients into groups

_ Identifying patients with recurring health problems

_ Relation between disease and symptoms

_ Curbing the treatment costs

_ Predicting medical diagnosis

_ Medical research

_ Hospital administration

_ Applications of data mining in health care

_ Conclusion

37. Explain how data mining is used in banking industry.

_ Data collected by data mining in banking

_ Banking data mining tools

_ Mining customer data of bank

_ Mining for prediction and forecasting

_ Mining for fraud detection

_ Mining for cross selling bank services

_ Mining for identifying customer preferences

_ Applications of data mining in banking

_ Conclusion

38. Explain the types of data mining.

_ Audio data mining

_ Video data mining

_ Image data mining

_ Scientific and statistical data mining

DATA WAREHOUSING AND DATA MINING,2Mark,Unit V

1.What are the classifications of tools for data mining?

• Commercial Tools

• Public domain Tools

• Research prototypes

2.What are commercial tools?

Commercial tools can be defined as the following products and usually are

associated with the consulting activity by the same company:

1. ‘Intelligent Miner’ from IBM

2. ‘SAS’ System from SAS Institute

3. ‘Thought’ from Right Information Systems. Etc

3. What are Public domain Tools?

Public domain Tools are largely freeware with just registration fees:

’Brute’ from University of Washington. ‘MC++’ from Stanford university, Stanford,

California.

4. What are Research prototypes?

Some of the research products may find their way into commercial

market: ‘DB Miner’ from Simon Fraser University, British Columbia, ‘Mining Kernel

System’ from University of Ulster, North Ireland.

5.What is the difference between generic single-task tools and generic multi-task

tools?

Generic single-task tools generally use neural networks or decision trees.

They cover only the data mining part and require extensive pre-processing and postprocessing

steps.

Generic multi-task tools offer modules for pre-processing and postprocessing

steps and also offer a broad selection of several popular data mining

algorithms as clustering.

6. What are the areas in which data warehouses are used in present and in future?

The potential subject areas in which data ware houses may be developed at

present and also in future are

1.Census data:

The registrar general and census commissioner of India decennially

compiles information of all individuals, villages, population groups, etc. This information

is wide ranging such as the individual slip. A compilation of information of individual

households, of which a database of 5%sample is maintained for analysis. A data

warehouse can be built from this database upon which OLAP techniques can be applied,

Data mining also can be performed for analysis and knowledge discovery

2.Prices of Essential Commodities

The ministry of food and civil supplies, Government of India complies

daily data for about 300 observation centers in the entire country on the prices of

essential commodities such as rice, edible oil etc, A data warehouse can be built

for this data and OLAP techniques can be applied for its analysis

7. What are the other areas for Data warehousing and data mining?

• Agriculture

• Rural development

• Health

• Planning

• Education

• Commerce and Trade

8. Specify some of the sectors in which data warehousing and data mining are used?

• Tourism

• Program Implementation

• Revenue

• Economic Affairs

• Audit and Accounts

9. Describe the use of DBMiner.

Used to perform data mining functions, including characterization,

association, classification, prediction and clustering.

10. Applications of DBMiner.

The DBMiner system can be used as a general-purpose online analytical

mining system for both OLAP and data mining in relational database and

datawarehouses.

Used in medium to large relational databases with fast response time.

11. Give some data mining tools.

DBMiner

GeoMiner

Multimedia miner

WeblogMiner

12. Mention some of the application areas of data mining

DNA analysis

Financial data analysis

Retail Industry

Telecommunication industry

Market analysis

Banking industry

Health care analysis.

13. Differentiate data query and knowledge query

A data query finds concrete data stored in a database and corresponds to a

basic retrieval statement in a database system.

A knowledge query finds rules, patterns and other kinds of knowledge in a

database and corresponds to querying database knowledge including

deduction rules, integrity constraints, generalized rules, frequent patterns and

other regularities.

14.Differentiate direct query answering and intelligent query answering.

Direct query answering means that a query answers by returning exactly what

is being asked.

Intelligent query answering consists of analyzing the intent of query and

providing generalized, neighborhood, or associated information relevant to the

query.

15. Define visual data mining

Discovers implicit and useful knowledge from large data sets using data and/

or knowledge visualization techniques.Integration of data visualization and data mining.

16. What does audio data mining mean?

Uses audio signals to indicate patterns of data or the features of data mining

results.Patterns are transformed into sound and music.

To identify interesting or unusual patterns by listening pitches, rhythms, tune

and melody.

Steps involved in DNA analysis

Semantic integration of heterogeneous, distributed genome databases

Similarity search and comparison among DNA sequences

Association analysis: Identification of co-occuring gene sequences

Path analysis: Linking genes to different stages of disease development

Visualization tools and genetic data analysis

17.What are the factors involved while choosing data mining system?

Data types

System issues

Data sources

Data Mining functions and methodologies

Coupling data mining with database and/or data warehouse systems

Scalability

Visualization tools

Data mining query language and graphical user interface.

18. Define DMQL

Data Mining Query Language

It specifies clauses and syntaxes for performing different types of data mining

tasks for example data classification, data clustering and mining association

rules. Also it uses SQl-like syntaxes to mine databases.

19. Define text mining

Extraction of meaningful information from large amounts free format textual

data.

Useful in Artificial intelligence and pattern matching

Also known as text mining, knowledge discovery from text, or content

analysis.

20. What does web mining mean

Technique to process information available on web and search for useful data.

To discover web pages, text documents , multimedia files, images, and other

types of resources from web.

Used in several fields such as E-commerce, information filtering, fraud

detection and education and research.

21.Define spatial data mining.

Extracting undiscovered and implied spatial information.

Spatial data: Data that is associated with a location

Used in several fields such as geography, geology, medical imaging etc.

22. Explain multimedia data mining.

Mines large data bases.

Does not retrieve any specific information from multimedia databases

Derive new relationships , trends, and patterns from stored multimedia data

mining.

Used in medical diagnosis, stock markets ,Animation industry, Airline

industry, Traffic management systems, Surveillance systems etc.

DATA WAREHOUSING AND DATA MINING,2Mark,Unit IV

1.Define data warehouse?

A data warehouse is a repository of multiple heterogeneous data sources

organized under a unified schema at a single site to facilitate management decision

making .

(or)

A data warehouse is a subject-oriented, time-variant and nonvolatile

collection of data in support of management’s decision-making process.

2.What are operational databases?

Organizations maintain large database that are updated by daily transactions are

called operational databases.

3.Define OLTP?

If an on-line operational database systems is used for efficient retrieval, efficient

storage and management of large amounts of data, then the system is said to be on-line

transaction processing.

4.Define OLAP?

Data warehouse systems serves users (or) knowledge workers in the role of data

analysis and decision-making. Such systems can organize and present data in various

formats. These systems are known as on-line analytical processing systems.

5.How a database design is represented in OLTP systems?

Entity-relation model

6. How a database design is represented in OLAP systems?

Star schema

Snowflake schema

Fact constellation schema

7.Write short notes on multidimensional data model?

Data warehouses and OLTP tools are based on a multidimensional data model.

This model is used for the design of corporate data warehouses and department data

marts. This model contains a Star schema, Snowflake schema and Fact constellation

schemas. The core of the multidimensional model is the data cube.

8.Define data cube?

It consists of a large set of facts (or) measures and a number of dimensions.

9.What are facts?

Facts are numerical measures. Facts can also be considered as quantities by which

we can analyze the relationship between dimensions.

10.What are dimensions?

Dimensions are the entities (or) perspectives with respect to an organization for

keeping records and are hierarchical in nature.

11.Define dimension table?

A dimension table is used for describing the dimension.

(e.g.) A dimension table for item may contain the attributes item_ name, brand and type.

12.Define fact table?

Fact table contains the name of facts (or) measures as well as keys to each of the

related dimensional tables.

13.What are lattice of cuboids?

In data warehousing research literature, a cube can also be called as cuboids. For

different (or) set of dimensions, we can construct a lattice of cuboids, each showing the

data at different level. The lattice of cuboids is also referred to as data cube.

14.What is apex cuboid?

The 0-D cuboid which holds the highest level of summarization is called the apex

cuboid. The apex cuboid is typically denoted by all.

15.List out the components of star schema?

A large central table (fact table) containing the bulk of data with no

redundancy.

_ A set of smaller attendant tables (dimension tables), one for each

dimension.

16.What is snowflake schema?

The snowflake schema is a variant of the star schema model, where some

dimension tables are normalized thereby further splitting the tables in to additional tables.

17.List out the components of fact constellation schema?

This requires multiple fact tables to share dimension tables. This kind of schema

can be viewed as a collection of stars and hence it is known as galaxy schema (or) fact

constellation schema.

18.Point out the major difference between the star schema and the snowflake

schema?

The dimension table of the snowflake schema model may be kept in normalized

form to reduce redundancies. Such a table is easy to maintain and saves storage space.

19.Which is popular in the data warehouse design, star schema model (or)

snowflake schema model?

Star schema model, because the snowflake structure can reduce the effectiveness

and more joins will be needed to execute a query.

20.Define concept hierarchy?

A concept hierarchy defines a sequence of mappings from a set of low-level

concepts to higher-level concepts.

21.Define total order?

If the attributes of a dimension which forms a concept hierarchy such as

“street

Country

Province or state

City

Street

Fig: Partial order for location

22.Define partial order?

If the attributes of a dimension which forms a lattice such as

“day<{month

23.Define schema hierarchy?

A concept hierarchy that is a total (or) partial order among attributes in a database

schema is called a schema hierarchy.

24.List out the OLAP operations in multidimensional data model?

_ Roll-up

_ Drill-down

_ Slice and dice

_ Pivot (or) rotate

25.What is roll-up operation?

The roll-up operation is also called drill-up operation which performs aggregation

on a data cube either by climbing up a concept hierarchy for a dimension (or) by

dimension reduction.

26.What is drill-down operation?

Drill-down is the reverse of roll-up operation. It navigates from less detailed data

to more detailed data. Drill-down operation can be taken place by stepping down a

concept hierarchy for a dimension.

27.What is slice operation?

The slice operation performs a selection on one dimension of the cube resulting in

a sub cube.

28.What is dice operation?

The dice operation defines a sub cube by performing a selection on two (or) more

dimensions.

29.What is pivot operation?

This is a visualization operation that rotates the data axes in an alternative

presentation of the data.

30.List out the views in the design of a data warehouse?

_ Top-down view

_ Data source view

_ Data warehouse view

_ Business query view

31.What are the methods for developing large software systems?

_ Waterfall method

_ Spiral method

32.How the operation is performed in waterfall method?

The waterfall method performs a structured and systematic analysis at each step

before proceeding to the next, which is like a waterfall falling from one step to the next.

33.How the operation is performed in spiral method?

The spiral method involves the rapid generation of increasingly functional

systems, with short intervals between successive releases. This is considered as a good

choice for the data warehouse development especially for data marts, because the turn

around time is short, modifications can be done quickly and new designs and

technologies can be adapted in a timely manner.

34.List out the steps of the data warehouse design process?

_ Choose a business process to model.

_ Choose the grain of the business process

_ Choose the dimensions that will apply to each fact table record.

_ Choose the measures that will populate each fact table record.

35.Define ROLAP?

The ROLAP model is an extended relational DBMS that maps operations on

multidimensional data to standard relational operations.

36.Define MOLAP?

The MOLAP model is a special purpose server that directly implements

multidimensional data and operations.

37.Define HOLAP?

The hybrid OLAP approach combines ROLAP and MOLAP technology,

benefiting from the greater scalability of ROLAP and the faster computation of

MOLAP,(i.e.) a HOLAP server may allow large volumes of detail data to be stored in a

relational database, while aggregations are kept in a separate MOLAP store.

38.What is enterprise warehouse?

An enterprise warehouse collects all the information’s about subjects spanning the

entire organization. It provides corporate-wide data integration, usually from one (or)

more operational systems (or) external information providers. It contains detailed data as

well as summarized data and can range in size from a few giga bytes to hundreds of giga

bytes, tera bytes (or) beyond. An enterprise data warehouse may be implemented on

traditional mainframes, UNIX super servers (or) parallel architecture platforms. It

requires business modeling and may take years to design and build.

39.What is data mart?

Data mart is a database that contains a subset of data present in a data warehouse.

Data marts are created to structure the data in a data warehouse according to issues such

as hardware platforms and access control strategies. We can divide a data warehouse into

data marts after the data warehouse has been created. Data marts are usually implemented

on low-cost departmental servers that are UNIX (or) windows/NT based. The

implementation cycle of the data mart is likely to be measured in weeks rather than

months (or) years.

40.What are dependent and independent data marts?

Dependent data marts are sourced directly from enterprise data warehouses.

Independent data marts are data captured from one (or) more operational systems (or)

external information providers (or) data generated locally with in particular department

(or) geographic area.

41.What is virtual warehouse?

A virtual warehouse is a set of views over operational databases. For efficient

query processing, only some of the possible summary views may be materialized. A

virtual warehouse is easy to build but requires excess capability on operational database

servers.

42.Define indexing?

Indexing is a technique, which is used for efficient data retrieval (or) accessing

data in a faster manner. When a table grows in volume, the indexes also increase in size

requiring more storage.

43.What are the types of indexing?

_ B-Tree indexing

_ Bit map indexing

_ Join indexing

44.Define metadata?

Metadata is used in data warehouse is used for describing data about data.

(i.e.) meta data are the data that define warehouse objects. Metadata are created for the

data names and definitions of the given warehouse.

45.Define VLDB?

Very Large Data Base. If a database whose size is greater than 100GB, then

the database is said to be very large database.

வருகைக்கு நன்றி

எனது வலைப்பூவிற்கு வருகை தரும் / தந்த அனைவரையும் வரவேற்கிறேன். பாரட்டுகளை விரும்பாத மனிதர் இல்லை, தன் குறையை திருத்த மற்றவர்களுக்கு வாய்பளிக்காதவரும் மனிதர் இல்லை , இதைக் கொஞ்சம் புரிந்து கொண்ட சராசரி மனிதன் நான். தயவு செய்து தவறுகளை சுட்டிக்காட்டுங்கள். நிறைகளை பகிர்ந்து கொள்ளுங்கள். சின்ன சின்ன அங்கீகாரம் மட்டுமே மனதிற்கும் வாழ்விற்கும் புத்துணர்வு அளிக்கும்.

நன்றி ,வணக்கம்

Pages

Total Page visits

Sunday, July 14, 2013

Data Warehousing and Data Mining,16 Mark Questions with Hints

DATA WAREHOUSING AND DATA MINING,2Mark,Unit V

DATA WAREHOUSING AND DATA MINING,2Mark,Unit IV