Question:

How can I calculate the total duration of overlapping periods (e.g. of a specific error category)?

How can I deal with multiple categories if I want to count overlapping periods only for the category with higher priority?


Solution:

Below you can find 3 functions which allow to calculate these things. To use them just load the the required dplyr package with "library(dplyr)". Copy and paste the functions you want to use at the top of your script (this is import!) and you are ready to use them in your script.


Script:

# -----------------------------------
# Function name:    unifyOverlappingPeriods    
# Inputs:           df ... data frame with the period data given by the start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
# Outputs:          data frame with the variables indx, start and stop which represent the unified time periods of the input data frame df
# Description:      This function summarizes all overlapping periods and returns the non-overlapping periods included in the data frame df
# -----------------------------------
unifyOverlappingPeriods <- function( df ){
    df %>%
        arrange( start ) %>%
        mutate( indx = c(0, cumsum( as.numeric(lead(start)) > cummax(as.numeric(stop)) )[-n()]) ) %>%
        group_by(indx) %>%
        summarise(start = min(start), stop = max(stop))
}

# -----------------------------------
# Function name:    sumUnifiedPeriods     
# Inputs:           df ... data frame with the period data given by the start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp         
# Outputs:          sum of all periods with all overlappings counted only once given in milliseconds
# Description:      This function unifies all periods given in the input data frame and returns the total duration in milliseconds of the non-overlapping/unified periods given 
# -----------------------------------
# Calculate sum of non-overlapping time period in ms
sumUnifiedPeriods <- function( df ){
    if ( nrow(df) > 0)
    {
        df_unified <- df %>%
                        unifyOverlappingPeriods()
        
        return(sum( df_unified$stop - df_unified$start ))
    }
    else
    {
        return(0)
    }
}

# -----------------------------------
# Function name:    sumPeriodsNonOverlappingWithHigherPrio
# Inputs:           df_highPrio ... data frame of the events of high priority. data given by start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
#                   df_lowPrio ... data frame of the events of low priority. data given by start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
# Outputs:          sum of unified low priority periods non-overlapping with any high priority event
# Description:      This functions returns the sum of the unified/non-overlapping periods of the low priority event given in df_lowPrio excluding the duration overlapping with any higher priority event given in df_highPrio
# -----------------------------------
sumPeriodsNonOverlappingWithHigherPrio <- function( df_highPrio, df_lowPrio ){
  sum_df_highPrio <- sumUnifiedPeriods(df_highPrio)
  
  sum_df_unified <- bind_rows(df_highPrio, df_lowPrio) %>%
          sumUnifiedPeriods()
  
    sum_df_lowPrio = sum_df_unified - sum_df_highPrio
}