The Relationship between Body Size and Lifespan

Introduction

While aging is an inevitable process for most species, there is an incredible diversity of lifespans throughout the Tree of Life, ranging from a few days to several millenia. For researchers interested in the fundamental biology behind aging, seeing what aspects of an organism’s biology correlate to lifespan is an important first step on the path to finding concrete explanations behind their longevity.
For example, in 1975, Dr. Richard Peto published a paper where he established that the different sizes and lifespans of humans and mice didn’t really relate to their respective cancer rates. This was described as Peto’s Paradox, because the expectation was originally that over a lifetime, every cell will accumulate mutations that could eventually cause it to become cancerous; and if an animal had more cells, then this lifetime risk of cancer would only increase further. In fact, it turns out that there is no relationship between body size, lifespan, and cancer, which is the fact that underlies the focus of my own research!
As we will explore in this section, this paradox is further complicated by another unexpected relationship: animals that are larger tend to also live longer.

Graphing the Data

For this analysis, we will be using the AnAge database of ageing and life history in animals. This database has entries for over 4200 species of animals (also 2 plants and 3 fungi) with data like max lifespan, growth rates and weights at different life stages, descriptions, and metabolism, amongst other things.

First, let’s take a look at the data itself:

# These are the packages we will be using in this analysis
library(tidyverse)
options(readr.num_columns = 0)
library(ggpubr)
library(plotly)
# Read the data into a dataframe:
anage <- read_tsv("../data/other/anage_build14.txt", 
                  col_names = T, 
                  col_types = list(
                    "References" = col_character(),   # Needs to be specified or else its interpreted as <int>
                    "Sample size" = col_factor(c("tiny", "small", "medium", "large", "huge"), ordered = T),
                    "Data quality" = col_factor(c("low", "questionable", "acceptable", "high"), ordered = T)
                    )
                  )
# Look at the data using str()
str(anage)
## Classes 'tbl_df', 'tbl' and 'data.frame':    4219 obs. of  31 variables:
##  $ HAGRID                          : chr  "00003" "00005" "00006" "00008" ...
##  $ Kingdom                         : chr  "Animalia" "Animalia" "Animalia" "Animalia" ...
##  $ Phylum                          : chr  "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ Class                           : chr  "Branchiopoda" "Insecta" "Insecta" "Insecta" ...
##  $ Order                           : chr  "Diplostraca" "Diptera" "Hymenoptera" "Hymenoptera" ...
##  $ Family                          : chr  "Daphniidae" "Drosophilidae" "Apidae" "Formicidae" ...
##  $ Genus                           : chr  "Daphnia" "Drosophila" "Apis" "Cardiocondyla" ...
##  $ Species                         : chr  "pulicaria" "melanogaster" "mellifera" "obscurior" ...
##  $ Common name                     : chr  "Daphnia" "Fruit fly" "Honey bee" "Cardiocondyla obscurior" ...
##  $ Female maturity (days)          : int  NA 7 NA NA NA NA 15 NA 1095 NA ...
##  $ Male maturity (days)            : int  NA 7 NA NA NA NA 15 NA 1095 NA ...
##  $ Gestation/Incubation (days)     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Weaning (days)                  : chr  NA NA NA NA ...
##  $ Litter/Clutch size              : num  NA NA NA NA NA NA NA NA 190 148 ...
##  $ Litters/Clutches per year       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Inter-litter/Interbirth interval: int  NA NA NA NA NA NA NA NA NA NA ...
##  $ Birth weight (g)                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Weaning weight (g)              : chr  NA NA NA NA ...
##  $ Adult weight (g)                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Growth rate (1/days)            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Maximum longevity (yrs)         : num  0.19 0.3 8 0.5 28 0.4 0.5 100 20 15.8 ...
##  $ Source                          : chr  NA NA "812" "1293" ...
##  $ Specimen origin                 : chr  "unknown" "captivity" "unknown" "captivity" ...
##  $ Sample size                     : Ord.factor w/ 5 levels "tiny"<"small"<..: 3 4 3 3 3 3 3 3 3 4 ...
##  $ Data quality                    : Ord.factor w/ 4 levels "low"<"questionable"<..: 3 3 3 3 3 2 3 3 3 3 ...
##  $ IMR (per yr)                    : num  NA 0.05 NA NA NA NA NA NA NA NA ...
##  $ MRDT (yrs)                      : num  NA 0.04 NA NA NA NA NA NA NA NA ...
##  $ Metabolic rate (W)              : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Body mass (g)                   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Temperature (K)                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ References                      : chr  "1294,1295,1296" "2,20,32,47,53,68,69,240,241,242,243,274,602,981,1150" "63,407,408,741,805,806,808,812,815,828,830,831,847,848,902,908,1143,1166" "1293" ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 31
##   .. ..$ HAGRID                          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Kingdom                         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Phylum                          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Class                           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Order                           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Family                          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Genus                           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Species                         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Common name                     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Female maturity (days)          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Male maturity (days)            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Gestation/Incubation (days)     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Weaning (days)                  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Litter/Clutch size              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Litters/Clutches per year       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Inter-litter/Interbirth interval: list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ Birth weight (g)                : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Weaning weight (g)              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Adult weight (g)                : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Growth rate (1/days)            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Maximum longevity (yrs)         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Source                          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Specimen origin                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ Sample size                     :List of 3
##   .. .. ..$ levels    : chr  "tiny" "small" "medium" "large" ...
##   .. .. ..$ ordered   : logi TRUE
##   .. .. ..$ include_na: logi FALSE
##   .. .. ..- attr(*, "class")= chr  "collector_factor" "collector"
##   .. ..$ Data quality                    :List of 3
##   .. .. ..$ levels    : chr  "low" "questionable" "acceptable" "high"
##   .. .. ..$ ordered   : logi TRUE
##   .. .. ..$ include_na: logi FALSE
##   .. .. ..- attr(*, "class")= chr  "collector_factor" "collector"
##   .. ..$ IMR (per yr)                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ MRDT (yrs)                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Metabolic rate (W)              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Body mass (g)                   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ Temperature (K)                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ References                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

Using str() is a useful way of quickly seeing the different columns of data and the type of data in each. Of interest are the taxonomic columns, the Adult Weight column, and the Maximum Lifespan column. You can also try use head() to look at the first few rows:

head(anage)
## # A tibble: 6 x 31
##   HAGRID Kingdom Phylum Class Order Family Genus Species `Common name`
##   <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr>        
## 1 00003  Animal… Arthr… Bran… Dipl… Daphn… Daph… pulica… Daphnia      
## 2 00005  Animal… Arthr… Inse… Dipt… Droso… Dros… melano… Fruit fly    
## 3 00006  Animal… Arthr… Inse… Hyme… Apidae Apis  mellif… Honey bee    
## 4 00008  Animal… Arthr… Inse… Hyme… Formi… Card… obscur… Cardiocondyl…
## 5 00009  Animal… Arthr… Inse… Hyme… Formi… Lasi… niger   Black garden…
## 6 00010  Animal… Arthr… Inse… Hyme… Formi… Phei… dentata Pheidole den…
## # ... with 22 more variables: `Female maturity (days)` <int>, `Male
## #   maturity (days)` <int>, `Gestation/Incubation (days)` <int>, `Weaning
## #   (days)` <chr>, `Litter/Clutch size` <dbl>, `Litters/Clutches per
## #   year` <dbl>, `Inter-litter/Interbirth interval` <int>, `Birth weight
## #   (g)` <dbl>, `Weaning weight (g)` <chr>, `Adult weight (g)` <dbl>,
## #   `Growth rate (1/days)` <dbl>, `Maximum longevity (yrs)` <dbl>,
## #   Source <chr>, `Specimen origin` <chr>, `Sample size` <ord>, `Data
## #   quality` <ord>, `IMR (per yr)` <dbl>, `MRDT (yrs)` <dbl>, `Metabolic
## #   rate (W)` <dbl>, `Body mass (g)` <dbl>, `Temperature (K)` <dbl>,
## #   References <chr>

Using tibble from tidyverse also automatically shows only the first few rows of the dataset if you call anage by itself.

All species

Let’s try graphing now. First, we will graph all the species; we will graph the adult weights versus their maximum lifespan, and color the datapoints by their Phylum:

# Create the basic plot
p.all <- anage %>% 
  ggplot(
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Phylum, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
  ) +
  geom_point(size=0.5) +
  scale_x_log10() +
  scale_y_log10() +
  labs(
    title='AnAge Species Adult Weight vs Lifespan',
    y='Adult Weight - log(g)',
    x='Maximum Longevity - log(yrs) '
  ) + 
  theme_pubclean() + 
  labs_pubr()
# Output the interactive plot
ggplotly(p.all)

(Note that we scaled the axes using a log-scale; this is done because we want to highlight orders-of-magnitude changes over small-scale change - in other words, we don’t care so much about the difference between 1-2 grams and 100-200 grams as much as a change between 1-10 grams and 100-1000 grams.)

You can see that the graph already has a clear upwards trend! However, there’s a bit of an issue that’s striking in the color scheme; where are our non-chordate species? The first guess I have is that it relates to size measurements - that’s relatively easy to check:

anage %>% 
  filter(!Phylum=="Chordata") %>% 
  select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, contains("weight"))
## # A tibble: 19 x 9
##    Kingdom Phylum Genus Species `Common name` `Maximum longev…
##    <chr>   <chr>  <chr> <chr>   <chr>                    <dbl>
##  1 Animal… Arthr… Daph… pulica… Daphnia                   0.19
##  2 Animal… Arthr… Dros… melano… Fruit fly                 0.3 
##  3 Animal… Arthr… Apis  mellif… Honey bee                 8   
##  4 Animal… Arthr… Card… obscur… Cardiocondyl…             0.5 
##  5 Animal… Arthr… Lasi… niger   Black garden…            28   
##  6 Animal… Arthr… Phei… dentata Pheidole den…             0.4 
##  7 Animal… Arthr… Bicy… anynana Squinting bu…             0.5 
##  8 Animal… Arthr… Homa… americ… American lob…           100   
##  9 Animal… Cnida… Turr… nutric… Immortal jel…            NA   
## 10 Animal… Echin… Stro… franci… Red sea urch…           200   
## 11 Animal… Echin… Stro… purpur… Purple sea u…            50   
## 12 Animal… Mollu… Arct… island… Ocean quahog…           507   
## 13 Animal… Nemat… Caen… elegans Roundworm                 0.16
## 14 Animal… Porif… Cina… antarc… Epibenthic s…          1550   
## 15 Animal… Porif… Scol… joubini Hexactinelli…         15000   
## 16 Plantae Pinop… Pinus longae… Great Basin …          5062   
## 17 Fungi   Ascom… Sacc… cerevi… Baker's yeast             0.04
## 18 Fungi   Ascom… Schi… pombe   Fission yeast            NA   
## 19 Fungi   Ascom… Podo… anseri… Filamentous …            NA   
## # ... with 3 more variables: `Birth weight (g)` <dbl>, `Weaning weight
## #   (g)` <chr>, `Adult weight (g)` <dbl>

None of the non-chordates in the AnAge database have any weight information - go figure! However, from the lifespan, we can see that some of these live a ridiculously long time:

# Filter anage based on the weird characteristics:
anage %>% 
  arrange(desc(`Maximum longevity (yrs)`)) %>% 
  head %>% 
  select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, `Adult weight (g)`, `Data quality`)
## # A tibble: 6 x 8
##   Kingdom Phylum Genus Species `Common name` `Maximum longev…
##   <chr>   <chr>  <chr> <chr>   <chr>                    <dbl>
## 1 Animal… Porif… Scol… joubini Hexactinelli…            15000
## 2 Plantae Pinop… Pinus longae… Great Basin …             5062
## 3 Animal… Porif… Cina… antarc… Epibenthic s…             1550
## 4 Animal… Mollu… Arct… island… Ocean quahog…              507
## 5 Animal… Chord… Somn… microc… Greenland sh…              392
## 6 Animal… Chord… Bala… mystic… Bowhead whale              211
## # ... with 2 more variables: `Adult weight (g)` <dbl>, `Data
## #   quality` <ord>

Remember how I said that some animals live for millenia? Behold the humble sponge; specifically, Scolymastra joubini, which apparently lives for 15,000 years! Its worth noting the column “Data.quality” here; there’s some skepticism in the literature as to whether or not this is estimate is real, since its so incredible. Runners-up include the Great Basin bristlecone pine, the Ocean quahog clam, the Greanland Shark, and my favorite, the Bowhead Whale!

Chordates

Moving on, let us graph the chordates, and color by class. Also, while we’re at it, let’s quantify the relationship between size and lifespan using a linear regression:

anage.chordata <- anage %>% 
  filter(
    Phylum=="Chordata",
    !is.na(`Adult weight (g)`),
    !is.na(`Maximum longevity (yrs)`)
      )


# Basic Graph
p.chordata <- anage.chordata %>% 
  ggplot(
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Class, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
  ) +
  geom_point(size=0.5) +
  scale_x_log10() +
  scale_y_log10() +
  geom_smooth(
    method='lm', 
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`), 
    inherit.aes = FALSE,
    col="black",
    lty="dashed"
    )+
  labs(
    title='Chordates Adult Weight vs Lifespan',
    y='Adult Weight - log(g)',
    x='Maximum Longevity - log(yrs) '
  ) +
  theme_pubclean() + 
  labs_pubr()

# Output the interactive plot
ggplotly(p.chordata)