-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
I find that I often require two things from the same assumption-checking code:
- Fail the analysis if the assumptions are incorrect,
- Separate out two data frames: (1) a dataframe of the rows with faulty assumptions (to remand to data collection) and (2) a data frame that passes the checks (for further data analysis).
Alternatively, get a single data frame with a column that indicates whether they passed the check.
I understand the original intention of engarde is to fail early, and it does provide some tools for (2), but there are two particular pain points:
- Getting back to a data frame with and without errors is a little tough. In some cases, that's easy:
verify_allreturns a dataframe inAssertionError.args[1]. In others, it is less so:none_missingreturns a list of(index, column)tuples, which all have to be passed topandas.DataFrame.locseparately. - Engarde throws the first errors it encounters, which means that any other checks that might fail will only be discovered when this error is worked around.
Can engarde be used for my use case, or is that too far away from engarde's philosophy?
Metadata
Metadata
Assignees
Labels
No labels