pyreal.RealApp.produce_narrative_feature_contributions#

RealApp.produce_narrative_feature_contributions(x_orig, model_id=None, x_train_orig=None, y_train=None, algorithm=None, shap_type=None, force_refit=False, training_size=None, format_output=True, num_features=5, select_by='absolute', llm_model='gpt3.5', detail_level='high', context_description=None, max_tokens=200, temperature=0.5)[source]#

Produce a feature contribution explanation, formatted in natural language sentence format using LLMs.

Parameters:
  • x_orig (DataFrame of shape (n_instances, n_features) or Series of length (n_features)) – Input(s) to explain

  • model_id (string or int) – ID of model to explain

  • x_train_orig (DataFrame) – Data to fit on, if not provided during initialization

  • y_train (DataFrame or Series) – Training targets to fit on, if not provided during initialization

  • algorithm (string) – Name of algorithm

  • shap_type (string) – If algorithm=”shap”, type of SHAP explainer to use

  • force_refit (Boolean) – If True, initialize and fit a new explainer even if the appropriate explainer already exists

  • training_size (int) – Number of rows to use in fitting explainer

  • format_output (bool) – If False, return output as a single list of narratives. Formatted outputs are more usable, but formatting may slow down runtimes on larger inputs

  • num_features (int) – Number of features to include in the explanation. If None, include all features

  • select_by (one of "absolute", "min", "max") – If num_features is not None, method to use for selecting which features to show. Not used if num_features is None

  • llm_model (string) – One of [“gpt3.5”, “gpt4”]. LLM model to use to generate the explanation. GPT4 may provide better results, but is more expensive.

  • detail_level (string) – One of [“high”, “low”]. Level of detail to include in the explanation. High detail should include precise contribution values. Low detail will include only basic information about features used.

  • context_description (string) – Description of the model’s prediction task, in sentence format. This will be passed to the LLM and may help produce more accurate explanations. For example: “The model predicts the price of houses.”

  • max_tokens (int) – Maximum number of tokens to use in the explanation

  • temperature (float) – LLM Temperature to use. Values closer to 1 will produce more creative values. Values closer to 0 will produce more consistent or conservative explanations.

Returns:

dictionary (if x_orig is DataFrame) or DataFrame (if x_orig is Series)

One dataframe per id, with each row representing a feature, and four columns: Feature Name Feature Value Contribution Average/Mode