Thursday, June 26, 2008

Theory Vs Data - "All models are wrong, and increasingly you can succeed without them." ?

I am not claiming that I am competent enough to have a deciding say in the newest (and the HOTTEST) debate doing rounds on the web right now.But as a technology follower and leader-to-be, I feel obliged to add my own comments on this.

".......Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.........."

".......There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot......"

I dont think that data can ever replace models. True, that Mr. Venter has done a lot of good for modern biology, But he is building on the knowledge of genes and replicating mechanisms discovered earlier, using the scientific methods of observation-hypothesis-validation.

Data is good only up to the limits we already have the theory ready for. It certainly is a great help in fully comprehending the implications and applications of the theoretical background we already have. But it in no means can generate new theory to do the future testing. It can only provide us with what something is but not why it is so.

PS - I love statistics , so please dont cite my short-handedness for stats as the reason why I sided with theory.

Stumble Upon Toolbar

4 comments:

abhishek.iitm said...

Copycat..atleast acknowledge the original writer in your blog..

Ankit Ashok said...

if u take the trouble of going to the link , u can read the whole article plus know the name of the author.

but for lazy sloths like u , the main story was published on wired.com and was written by its editor-in-chief Chris Anderson (canderson@wired.com)

Navneet Sankara said...

I fully agree with you on this one. And I totally disagree with the last statement made by Chris Anderson in the article.

... and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

First of all, what is Science? Is it about finding relations between things? No. Finding these connections (correlation or causation) only gives us a handle of what we might need to look at. Is Science about explaining the world as it is? I'm not sure that it is.

My belief is that Science is about prediction. It's our fundamental desire to be able to reliably predict the outcome of every activity around us that has led to Scientific endeavour.

If you agree with this definition of the purpose of Science, you will see immediately that data only gives us knowledge of what variables will affect our prediction. Explanation of the world around us is only a necessary step in the process of reliable prediction.

So data-mining can help us establish relations in unstructured data, but not Scientific understanding. Science is going to be (and is) influenced greatly by the Google-method, but it only helps it in attaining the ultimate goal.

vikash said...

U know..main kya likhu i dont know..
Honestly i cant understand nything fundu u have written here..