dr.ricky online

Tag: data visualization

  • COVID-19 Tests

    COVID-19 Tests

    Coronavirus in Texas
    Snapshot from Texas Tribune tracking of coronavirus testing in Texas.

    Before May 14, 2020, Texas reporting of coronavirus test mixed results from two different kinds of tests: the PCR tests and the antibody tests. The PCR test looks for the presence of the genetic material of SARS-COV-2, it answers the question: “Is the patient infected and contagious?”

    The other kind, the antibody test, looks for the presence of early antibodies in the blood. It answers the question, “Has the patient been infected in the past?” At this time, we do not know if the presence of antibodies confers immunity to the virus

    Mixing the two results is highly problematic. It can make it appear that there are fewer infectious individuals, since antibody tests tend to come up negative more, and that inflates the denominator. Part of the reason why Gov Abbott proceeded with Phase 2 in reopening Texas activities is be attributed the increasing rate of detected infections to an increased number tests happening. However, that is not the case – much of this increase is due to mixing of the antibody tests. The rate of PCR tests is almost flat, but the rate of new infected cases continue to rise. 

  • Visualizing NCAA GPA data

    On 5 Sept 2018, the NCAA Research team tweeted out this chart  :

    It reports the average core high school grade point averages (GPA) among NCAA Division I freshman student athletes. So, a bit of a background – the National Collegiate Athletics Association governs just about all collegiate athletic programs in America, and the Division I schools devote the most money and resources to their athletic programs. A great deal of attention is thus focused on the Division I programs, almost to the detriment of the others (it goes all the way to Division III). The GPA is usually used as a measure of academic performance, though it may not reflect the difficulty of the coursework. But this chart is an egregious use of “infographics” to mislead rather than to bring insight to data:

    • Without a Y-axis to denote scale, the use of bar charts here visually make it appear that 3.77 is 7x higher than 3.07, when it’s actually far smaller in scale on standard 4.0 GPA scales (it tops out at 4).
    • The categorical use of the different sports makes it appear that it is the independent variable, and that GPA is what is being measured. But since the GPA was measured in high school, it actually precedes the sport.
    • Because of this switch in dependent and independent variables, a reader may interpret some form of causality – implying for example that choosing fencing will lead to better academic performance.

    Good data visualization should serve to bring new insight to the data that isn’t evident from just looking at the numbers. The GPAs considered here range between 3-4, which is letter grade B-A, quite above average academically, and that is unsurprising. These are the high school GPAs of student athletes recruited to Division I schools, arguably the most competitive programs. This is a measure of their past academic performance, but doesn’t say anything about how the sport chosen affects their current or future performance. The data, however, informs something about the sports programs themselves. Using the exact same data, I replotted the chart.

    High school GPA of males and females as recruited into NCAA Div 1 sports programs.

    The chart is in two parts – on the left is the section where a sport is available for both males and females, and on the right is a smaller section for sports that are gender specific. The axes go from 3.0 to 4.0, indicating the spread within this range. Sports are labeled accordingly.

    A linear relationship exists between enrolled female and male student athlete high school GPAs  – regardless of sport program. What this means is that at least within each sport, they apply their GPA criteria roughly with the same proportion to both genders. Which probably means that the sports programs recruit from the same communities for both men and women, that is fencing programs put a heavier emphasis on high GPAs for admission than basketball programs do, regardless of gender. But we see a stark difference in the GPA cutoffs between genders: almost all athletic programs recruit females with a GPA above 3.5, while more than half athletic programs enrolled male student athletes with GPAs below 3.5. In fact, all the male specific sport programs – baseball, wrestling and football – recruit with GPAs below 3.5. One cannot make definitive interpretations without further details on how the data is collected, but this implies that the barrier to entry to a collegiate athletic program, at least based on GPA, is significantly lower for males than for females. While some may think that this indicates superior academic performance among female student athletes, it could be an indicator for a systemic bias when recruiting for women across all sports programs.

  • Vaccination Exemptions in the USA

    Vaccination Exemptions in the USA

    Vaccination Exemptions in the USA

    The United States Centers for Disease Control (CDC) publishes a Morbidity and Mortality Weekly Report, and in it they track the vaccination rates in different states for children enrolled in kindergarten, and an interesting table is the report on the rate of exemptions from vaccinations, as well as the reason behind it. Granted, different states have varying laws with regards to vaccination requirements, and some allow separation of the exception reasons between medical, religious and other philosophical reasons, which makes getting consistent data problematic. But we do have good data for the 2015–2016 enrollment, and the 2016–2017 enrollment.

    The reports themselves are straight tables, but data visualization helps in teasing out the meaning there.

    2016_2017_CDC
    Summarizing the CDC reports between 2015-2016 and 2016-2017 school years for the rate of vaccine exemptions among kindergarten students, divided by state. A number of states are excluded. Blue dots are for the earlier year, red dots for the data a year later. Note that for herd immunity, the general consensus is about 95% of the population should be vaccinated. The Y-axis displays the ratio between medical and non medical reasons given for the exemption. Note that with the exception of DC, all states have ratios below 1, which means that more people are seeking exemptions for religious or philosophical reasons than for medical ones. 

    This data is dense, but highlights some problematic states, like Oregon, which has an unusually high rate of vaccine exemptions, and most of them for non medical reasons. Let’s look at the trend from year to year.

    Change year
    The arrows point in the direction which portend better public health trends: a drop in the rate of exemptions, and an increase in ratio of medical to non-medical reasons. California and Vermont seem to be on the right track, but most of the country is actually inching in the wrong direction, with Nevada and Wisconsin leading the way. 

    Sadly, the antivaccinationist movement seems to be permeating the mindshare, just by manipulating doubt and exploiting parental concern. Non medical exemptions are a key to this degradation of our public health system.