One of the main differences between statistical analysis and Data Mining

[ad_1]

Two methods for the detection and data that are common in both the academic and commercial sectors, statistical analysis and data mining. Although the statistical analysis has a long scientific history, data mining is a recent approach to data analysis that has come from the computer. In this article I want to give a presentation on these methods and outline what I believe is one of the main differences between the two areas of analysis.

Statistical analysis includes a general practitioner develop a theory and then test the validity of this hypothesis by running statistical tests of the data may have been collected for the purpose. For example, if the expert was to examine the relationship between income level and ability to get loans, the expert may conjecture that there will be a correlation between income level and amount of credit some may qualify for.

The expert could then test this hypothesis by using data that includes a number of people along with their income levels and credit available to them. A test could be run to introduce such that there may be a lot of faith that there is indeed a correlation between income and credit. The point here is that the analyst has formulated a hypothesis and then used statistical tests, with data to provide evidence in support of or against the hypothesis.

Data Mining is another area of data analysis that has arisen recently from computer that has a number of differences to traditional processing. First, many data mining technology designed to apply to very large databases, and statistical analysis are often designed to generate evidence in support of or against the hypothesis of a limited set of data.

Probably mist significant difference, however, is that data mining techniques are not used so much to generate confidence in the hypothesis, but reduce unknown relationships may be present in the data. This is probably best explained by example. Rather than in the above cases statistician form a hypothesis between income levels and candidates the ability to get loans, data mining, it is not usually at the beginning of speculation. A data mining expert can have a large data set of loans that have been given to people along with demographic information about these people, such as the level of their income, their age, existing debts they have and if they have defaulted on loan before .

A data mining techniques can then search this large data and reduce a previously unknown link between income levels, countries current debt and their ability to get loans.

Although there are quite a few differences in the statistical analysis and data mining, I believe this difference is in the heart of the matter. A lot of statistical analysis is the analysis of data to either form trust for or against a hypothesis but data mining is often more to apply algorithms to reduce data previously unforeseen relationships.

[ad_2]