What can data science tell us about influenza?

by Gabriela Cybis

For most people, catching the flu is a common minor inconvenience: they get a bad headache, maybe a fever, sore throat, cough, sometimes they will miss a day or two of work and be miserable at home for a while. But after a few days it is back to business as usual.  Most of the time, only the very young or the elderly, with their weakened immune systems, are at risk of complications. However, occasionally a new strain of influenza comes along, reminding society of the perils of this widespread virus that has become part of our lives every winter. Most recently, this was the case with the strain of H1N1 influenza also known as swine flu, which reached pandemic proportions in only a few months and had global health authorities on alert. This process has happened a few times over the last century, most memorably in the 1918 Spanish flu outbreak that killed between 2.5% and 5% of the global population and had a significant impact on the course of the First World War. Whether in these new emerging forms or in its more common seasonal variants, influenza is a serious burden on public health. Our biggest ally against it is information. In the current era of extreme wealth of data, statistical and computational methods are important tools to help make sense of the data and generate a better understanding of influenza.   

The Influenza virus infects both birds and mammals. Generally, different strains circulate in different species, however every so often a virus will cross the species barrier to another host. Sometimes the same host will be infected with more than one influenza strain, which can then recombine, creating a new hybrid virus. Settings where humans live in close proximity to domestic birds and pigs make for genetically promiscuous environments that are prime spots for the emergence of a new influenza strain in humans. Generally there is little to no immunity to this new strain in the human population, which means that if the virus can sustain efficient human-to-human transmission there is a clearer path for it to reach pandemic status. Crossover from the animal reservoir was responsible for the 2009 H1N1 swine flu pandemic, the 2004 H5N1 Southeast Asia bird flu threat, and other outbreaks throughout history. For this reason, surveillance of influenza strains that circulate in other animals and statistical methods that analyze these data are important for assessing influenza risks for humans.

Global health authorities must have contingency plans in place for the time when the next serious outbreak emerges. But what are the best strategies to contain the spread of an epidemic? The current air travel grid highly increases our connectivity to any other country in the planet; should we significantly limit air travel despite the impact on global economy? Schools and preschools have been identified as potential hotspots for transmission inside communities; should we keep all kids at home in the early days of an outbreak? If there are only limited resources for vaccination and antivirals, where should they be employed in order to reduce the impact of the epidemic? Researchers have used mathematics in extensive computer simulations to try to answer these questions. These efforts involve replicating in silico, with an incredible degree of detail, the connectivity patterns of society, sometimes to the level of household size, daily commute distance, frequency of air travel and age distribution. Specific outbreak predictions largely vary depending on properties of the virus, but it seems that partial restrictions on mobility will sometimes only slow the timeline of an epidemic, and carefully orchestrated combined approaches tend to have the highest impact on reducing the total number of influenza cases.

Although these emerging influenza strains have health authorities on alert for their pandemic potential, seasonal influenza also has enormous impacts on society with worldwide annual death toll between 300,000 and 500,000 and economic loss in the billions. By combining geographic, genetic and temporal information through statistical models we can reconstruct the evolution of the epidemic and assess key features of this process. These studies have shown that although the virus is constantly changing, at any given moment there is a relatively small genetic variability. Furthermore, it is highly likely that of all the seasonal influenza strains circulating at the present, one of them will multiply and give rise to the entire seasonal influenza populations in around 5 years. The descendants of all other viruses will most likely be extinct. The lineage of evolutionarily successful viruses is usually termed the trunk of H3N2 influenza’s evolutionary tree and is of particular interest to researchers for giving rise to all the different branches of this virus over time. Geographic reconstructions of H3N2 influenza’s evolution for recent years have estimated that this evolutionarily persistent lineage is located in China or Southeast Asia around 75% of the time, highlighting the importance of this region for the epidemiology of the seasonal flu. On a global scale, the dynamics of H3N2 influenza seem to be driven by air passenger flows, while at local scales its spread is also determined by processes that correlate with geographic distance.

A tree representation of genetic relationships between H3N2 influenza viruses from 2002 to 2007. The thicker line on the tree represents the successful evolutionary trunk lineage that gives rise to all influenza strains over time. The tree is colored according to estimated geographic location, indicating high permanence of the trunk in China and Southeast Asia. Figure courtesy of Lemey P, Rambaut A, Bedford T, Faria N, Bielejec F, et al.

A tree representation of genetic relationships between H3N2 influenza viruses from 2002 to 2007. The thicker line on the tree represents the successful evolutionary trunk lineage that gives rise to all influenza strains over time. The tree is colored according to estimated geographic location, indicating high permanence of the trunk in China and Southeast Asia.
Figure courtesy of Lemey P, Rambaut A, Bedford T, Faria N, Bielejec F, et al.

Our main means of protection against the virus is vaccination. Vaccines stimulate the immune system to generate a defense against the virus, so that when we get infected we already have defenses, and can defeat the virus before getting sick.  Unlike many vaccines that elicit protection for life, taking the influenza vaccine one year will only give you limited protection for the following seasons. Like with most infections, you generate an immune response to the influenza strain to which you were vaccinated. However, the virus changes so rapidly that this immune response is generally not effective against viruses circulating in the next year. The reason for this constant change in the virus is twofold. First, influenza’s genetic information is encoded in RNA, and for this reason it has a much higher mutation rate and is able to change faster than DNA viruses.  Second, viruses that are different from current and past circulating forms have a better chance of survival since there is little immunity against them in the human population; this principle fuels an arms race between viruses and human immunity, generating a trend of viral diversification known as immune escape. Since influenza changes so rapidly, it is not enough to have one influenza vaccine; the vaccine must be periodically updated to reflect the current influenza diversity. The main problem with this update is that during the months it takes to produce the vaccine the viral population has already changed. So when selecting the influenza strains that will be used to build the vaccine, we must select among the current strains the ones that best represent the future diversity of the virus.  Forecasting the state of future influenza populations is a complex task, and the determination of vaccine strains involves combing genetic, immunogenic and epidemiological data. Analysis of past vaccines tells us that there is room for improvement in this process, and thus researchers are working on statistical models that integrate all this data to help us make better decisions in the selection of vaccines strains, and hopefully more efficient vaccines.

In comparison to the 1918 epidemic, we are much better prepared to deal with influenza today. We have better medical conditions to treat complications of the disease, we have vaccines and antivirals to reduce chances of infection and to treat it, but most of all, we have much better information on how the virus spreads and diversifies.  Whether it is to assess the risk of new epidemics, understand geographic dispersion patterns, test the results of containment strategies before having to apply them in real crises, or help design real vaccines, computational and statistical tools have largely improved our understanding of influenza. Further developments that build on the current wealth of information by combining different types of data through statistical and computational methods is an active area of research that has great potential to help us manage this widespread virus.

Gabriela Cybis is a 2009 fellow of the Fulbright Science & Technology Award from Brazil. She completed her PhD in Biomathematics at the University of California, Los Angeles (UCLA) and currently holds a faculty position in the Statistics Department at Universidade Federal do Rio Grande do Sul (UFRGS-Brazil).

One thought on “What can data science tell us about influenza?

  1. Pingback: NanoViricides President Dr. Diwan Presented FluCide Data at the 3rd Annual … | Herpes Survival Kit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s