For numeric variables with few numeric values.. say 2-3, the percentile values will be same across a lot of points. Hence, the condition in np.where(arr > arr[start]) might break and return the wrong lowest percentile, causing the program to be stuck in the while loop.
def get_next_range(arr,group_range,start):
if group_range + start >=100:
return 100
elif (100 - group_range/2) < start + group_range:
return 100
elif arr[-1] == arr[start]:
return 100
elif (arr[start+group_range] == arr[start]) or (arr[start] < 0):
return np.max([np.min(np.where(arr > arr[start])),np.min(np.where(arr >= 0))])
else:
return group_range + start
For rectification of this error, percentile values after calculation must be rounded off to some fixed decimal values
Something like the following
percentiles = np.around(np.array([np.percentile(df1[var],p) for p in range(0,100)]), decimals = 5)
will fix this issue
For numeric variables with few numeric values.. say 2-3, the percentile values will be same across a lot of points. Hence, the condition in np.where(arr > arr[start]) might break and return the wrong lowest percentile, causing the program to be stuck in the while loop.
For rectification of this error, percentile values after calculation must be rounded off to some fixed decimal values
Something like the following
percentiles = np.around(np.array([np.percentile(df1[var],p) for p in range(0,100)]), decimals = 5)will fix this issue