What you do with data is more important than merely having data

One of the biggest booms going around in different industries is “big data”. Everyone including Google, Facebook, Amazon, LinkedIn and other software giants are investing billions of dollars into data analytics. Many are doing this incredibly well and are dreaming of predicting their customers’ next move based on their previous behaviour patterns. A small example is an Amazon advertisement that follows you on your Facebook feed based on what items you searched on the amazon website or app. This happens because their analytics have shown that following the digital footprints of a customer tends to increase their sales.

Can we apply these concepts of big data and analytics to the field of medicine? The answer is “Yes”. But you might think how. Big data analytics can be applied to huge medical databases to find answers to as yet unanswered questions in a particular medical field. In other words, big data can be applied to the field of medical research and publish papers that can improve patient care.

In the developing world, especially in India, Thailand, Malaysia, Egypt, Brazil and other such countries, there are excellent medical facilities that are catering to service delivery and improving outcomes for their patients. These large institutions are generating a huge amount of data on a daily basis and recording data on electronic platforms called Electronic Medical Records or EMR. Imagine if some of that data can be exported out of the EMR systems and used for data analysis using the concepts of big data enumerated above, how many questions can be answered? The data is so rich and the numbers so huge that others cannot rival the results and new paradigms will be set based on the results.

However, this is not happening currently. You may think why? Are physicians and health care personnel not interested in improving standards of care? Well, of course they are. Just that they have not received adequate training to leverage the data they possess into meaningful publications. Amongst others, an important aspect of doing good clinical research and publishing papers is analysing data well. Having a large sample size, though important, does not guarantee publication of the paper unless your data has been analysed in a novel manner to find answers to unanswered questions.

Always think of a control group to compare the outcomes of your primary study group. This is often overlooked by the investigators and this leads to rejections from the top journals. Having a control group allows you to have group – wise comparative statistical analyses that make your results robust. If, for some reason, you are unable to have a pure control group, consider splitting your data into 2 groups based on different parameters you have recorded such as 2 age groups. It is ideal to split your data into two groups based on your outcome measure of interest. For example, if you are looking to study influence of a particular injection on retinal thickness, you can split into two groups based on how much thickness changed before and after the injection. A tip is to split your data based on the median value of the variable of interest to get two equal groups for comparison.

Data analysis should be in sync with your study question. Often we come across studies that promise “factors influencing or prognostic factors” in the title but no statistical analysis has been done to understand these influences. Causality association i.e. cause – effect relationship can be determined only by regression analysis, when influences of confounders are adjusted for. Investigators should always consider performing regression analysis for almost all their studies with relatively large samples. This analysis often throws up results that make the data analysis and results very novel and worth publishing. Finally, regression allows you to do predictive analysis i.e. estimate chances of surgical success, failure, cancer survival, risk of disease in future etc. with some certainty. This is similar to the Amazon analogy I mentioned at the start. Predictive analytics combined with clinical acumen will make you a much better clinician and help expect realistic results in your daily practice.

Certain other special tests such as survival analysis, nested regression, Rasch analysis etc. may be required for deciphering the best results from your data. These special tests depend on the type of data and its distribution and can be suggested by an expert biostatistician. I recommend involving a statistician at the very beginning of your study who will help you with study design and create a statistical plan that will be used later.

It is imperative that those who plan to get involved with clinical research understand the basics of biostatistics. This knowledge will help in understanding the finer nuances of your research better and it is these little things that will make a difference between getting published in a top journal vs. getting accepted in a relatively low ranking journal. Above all, your basic knowledge will help in communicating with your biostatistician well and interpret results. You are also in a position to ask for specific set of analyses from your biostatistician that will answer your study question the best.

In conclusion, it is more important what you do with your data rather than having a lot of data. Try and understand basics of biostatistics, ask pertinent questions to your biostatistician and order goal directed analyses. This will set you apart from others and help you climb the ladder of publications quicker and with higher quality from your peers.