pyreal.RealApp.produce_narrative_feature_contributions#

RealApp.produce_narrative_feature_contributions(x_orig, model_id=None, x_train_orig=None, y_train=None, algorithm=None, shap_type=None, force_refit=False, training_size=None, format_output=True, num_features=5, select_by='absolute', gpt_model_type='gpt-3.5', context_description=None, max_tokens=200)[source]#

Produce a feature contribution explanation, formatted in natural language sentence format using LLMs. Do not use this function if your transformer list ends with a NarrativeTransformer - simply call produce_feature_contributions instead.

Parameters:

x_orig (DataFrame of shape (n_instances, n_features) or Series of length (n_features)) – Input(s) to explain
model_id (string or int) – ID of model to explain
x_train_orig (DataFrame) – Data to fit on, if not provided during initialization
y_train (DataFrame or Series) – Training targets to fit on, if not provided during initialization
algorithm (string) – Name of algorithm
shap_type (string) – If algorithm=”shap”, type of SHAP explainer to use
force_refit (Boolean) – If True, initialize and fit a new explainer even if the appropriate explainer already exists
training_size (int) – Number of rows to use in fitting explainer
format_output (bool) – If False, return output as a single list of narratives. Formatted outputs are more usable, but formatting may slow down runtimes on larger inputs
num_features (int) – Number of features to include in the explanation. If None, include all features
select_by (one of "absolute", "min", "max") – If num_features is not None, method to use for selecting which features to show. Not used if num_features is None
gpt_model_type (string) – One of [“gpt3.5”, “gpt4”]. LLM model to use to generate the explanation. GPT4 may provide better results, but is more expensive.
context_description (string) – Description of the model’s prediction task, in sentence format. This will be passed to the LLM and may help produce more accurate explanations. For example: “The model predicts the price of houses.”
max_tokens (int) – Maximum number of tokens to use in the explanation

Returns:

dictionary (if x_orig is DataFrame) or DataFrame (if x_orig is Series): One dataframe per id, with each row representing a feature, and four columns: Feature Name Feature Value Contribution Average/Mode