performing principal components analysis of the auxiliary variables, and including a small number of components in the imputation model instead of the original variables.
Impute composite variables instead of individual components
techniques may perform well, it is rarely the case, so you need a few backup.
Identifying the Type of Missingness
The first step to implementing an effective imputation strategy is identifying why the values are missing. Even though each case is unique, missingness can be grouped into three broad categories:
Missing Completely At Random (MCAR): this is a genuine case of data missing randomly. Examples are sudden mistakes in data entry, temporary sensor failures, or generally missing data that is not associated with any outside factor. The amount of missingness is low.
Missing At Random (MAR): this is a broader case of MCAR. Even though missing data may seem random at first glance, it will have some systematic relationship with the other observed features โ for example โ data missing from observational equipment during scheduled maintenance breaks. The number of null values may vary.
Missing Not At Random (MNAR):