Question:

How can I calculate the total duration of overlapping periods (e.g. of a specific error category)?

How can I deal with multiple categories if I want to count overlapping periods only for the category with higher priority?


Solution:

Below you can find 3 functions which allow to calculate these things. To use them just load the the required packages. Copy and paste the functions you want to use at the top of your script (this is important!) and you are ready to use them in your script.


Script:

# required packages
import pandas as pd
import datetime as dt
import numpy as np

def unifyOverlappingPeriods(df):
    """
    Function name:    unifyOverlappingPeriods    
    Inputs:           df ... pandas-DataFrame with the period data given by the start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
    Outputs:          pandas-DataFrame with the variables indx, start and stop which represent the unified time periods of the input pandas-DataFrame df
    Description:      This function summarizes all overlapping periods and returns the non-overlapping periods included in the pandas-DataFrame df
    """

    df = df.sort_values(by = 'start', ).reset_index(drop=True)
    df['indx'] = [0]+[df['start'][i+1] > df['stop'].cummax()[i] for i in range(0,len(df)-1)] # instead of [i] one could also use .iloc[i]
    df['indx'] = df['indx'].cumsum()
    return df.groupby('indx').agg({'start': np.min,'stop': np.max}).reset_index()[['start','stop']]
    

def sumUnifiedPeriods(df):
    """
    Function name:    sumUnifiedPeriods     
    Inputs:           df ... data frame with the period data given by the start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp         
    Outputs:          sum of all periods with all overlappings counted only once given in milliseconds
    Description:      This function unifies all periods given in the input data frame and returns the total duration in milliseconds of the non-overlapping/unified periods given 
    -----------------------------------
    Calculate sum of non-overlapping time period in ms
    """
    if len(df) == 0:
        return 0
    df = unifyOverlappingPeriods(df)
    return (df['stop'] - df['start']).sum()

def sumPeriodsNonOverlappingWithHigherPrio(df_highPrio, df_lowPrio):
    """
    Function name:    sumPeriodsNonOverlappingWithHigherPrio
    Inputs:           df_highPrio ... data frame of the events of high priority. data given by start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
                      df_lowPrio ... data frame of the events of low priority. data given by start DateTime and stop DateTime given in the variables start and stop as Unix Time stamp
    Outputs:          sum of unified low priority periods non-overlapping with any high priority event
    Description:      This functions returns the sum of the unified/non-overlapping periods of the low priority event given in df_lowPrio excluding the duration overlapping with any higher priority event given in df_highPrio
    """
    sum_df_highPrio = sumUnifiedPeriods(df_highPrio)
    
    df_unified = df_highPrio.append(df_lowPrio, ignore_index=True, sort = True)
    sum_df_unified = sumUnifiedPeriods(df_unified)
    
    return (sum_df_unified - sum_df_highPrio)