out <- dt %>% filter(hierarchy==hierarchy_value) %>% # .[rowSums(is.na(.[,..cols_filter])) < length(cols_filter),] %>% # remove if all are NA .[rowSums(is.na(.[,..cols_filter])) == 0,] %>% # remove if all are NA select(all_of(cols_serving), all_of(cols_filter)) %>% .[, N:=.N-1, keyby=cols_filter] %>% #Finding match number aside from self # https://stackoverflow.com/questions/24151602/calculate-multiple-aggregations-on-several-variables-using-lapply-sd .[N>1,] %>% #Filtering to remove anything with no matches .[, as.list(unlist(lapply(.SD, function(x){ list( mean=mean(x[!is.infinite(x)], na.rm=TRUE, trim=0.1), min=as.numeric(quantile(x[!is.infinite(x)], 0.1, na.rm=TRUE)), max=as.numeric(quantile(x[!is.infinite(x)], 0.9, na.rm=TRUE)) ) }))), by=c(cols_filter, "N"), .SDcols=cols_serving] %>% mutate( Match_Type = hierarchy_value, Match = paste(cols_filter, collapse = ', ') )
### Problems * How to calculate the mean of a column that has mixed data types. * How to find the mean of a column we have filtered for numeric values only. * How to find the mean of a column we have filtered for numeric values only and handle NA's. * How to find the mean of a column we have filtered for numeric values only, handle NA's and trim the data to the 10th percentile. * How to find the mean of a column we have filtered for numeric values only, handle NA's, trim the data to the 10th percentile, and round the result to 2 decimal places. * How to find the mean of a column we have filtered for numeric values only, handle NA's, trim the data to the 10th percentile, and round the result to 2 decimal places and store the result in a new column. * How to find the mean of a column we have filtered for numeric values only, handle NA's, trim the data to the 10th percentile, and round the result to 2 decimal places and store the result in a new column for each column in the dataset. * How to find the mean of a column we have