Preview features used by data frame analytics Generally available; Added in 7.13.0

POST /_ml/data_frame/analytics/{id}/_preview

All methods and paths for this operation:

GET /_ml/data_frame/analytics/_preview

POST /_ml/data_frame/analytics/_preview
GET /_ml/data_frame/analytics/{id}/_preview
POST /_ml/data_frame/analytics/{id}/_preview

Preview the extracted features used by a data frame analytics config.

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • id string Required

    Identifier for the data frame analytics job.

application/json

Body

  • config object

    A data frame analytics config as described in create data frame analytics jobs. Note that id and dest don’t need to be provided in the context of this API.

    Hide config attributes Show config attributes object
    • source object Required
      Hide source attributes Show source attributes object
      • index string | array[string] Required

        Index or indices on which to perform the analysis. It can be a single index or index pattern as well as an array of indices or patterns. NOTE: If your source indices contain documents with the same IDs, only the document that is indexed last appears in the destination index.

      • runtime_mappings object

        Definitions of runtime fields that will become part of the mapping of the destination index.

        Hide runtime_mappings attribute Show runtime_mappings attribute object
        • * object Additional properties
      • _source object

        Specify includes and/or `excludes patterns to select which fields will be present in the destination. Fields that are excluded cannot be included in the analysis.

        Hide _source attributes Show _source attributes object
        • includes array[string]

          An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

        • excludes array[string]

          An array of strings that defines the fields that will be included in the analysis.

      • query object

        The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.

        Query DSL
    • analysis object Required
      Hide analysis attributes Show analysis attributes object
      • outlier_detection object

        The configuration information necessary to perform outlier detection. NOTE: Advanced parameters are for fine-tuning classification analysis. They are set automatically by hyperparameter optimization to give the minimum validation error. It is highly recommended to use the default values unless you fully understand the function of these parameters.

        Hide outlier_detection attributes Show outlier_detection attributes object
        • compute_feature_influence boolean

          Specifies whether the feature influence calculation is enabled.

          Default value is true.

        • feature_influence_threshold number

          The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.

          Default value is 0.1.

        • method string

          The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.

          Default value is ensemble.

        • n_neighbors number

          Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.

        • outlier_fraction number

          The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.

        • standardization_enabled boolean

          If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).

          Default value is true.

    • model_memory_limit string
    • max_num_threads number
    • analyzed_fields object
      Hide analyzed_fields attributes Show analyzed_fields attributes object
      • includes array[string]

        An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

      • excludes array[string]

        An array of strings that defines the fields that will be included in the analysis.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • feature_values array[object] Required

      An array of objects that contain feature name and value pairs. The features have been processed and indicate what will be sent to the model for training.

      Hide feature_values attribute Show feature_values attribute object
      • * string Additional properties
POST /_ml/data_frame/analytics/{id}/_preview
POST _ml/data_frame/analytics/_preview
{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}
resp = client.ml.preview_data_frame_analytics(
    config={
        "source": {
            "index": "houses_sold_last_10_yrs"
        },
        "analysis": {
            "regression": {
                "dependent_variable": "price"
            }
        }
    },
)
const response = await client.ml.previewDataFrameAnalytics({
  config: {
    source: {
      index: "houses_sold_last_10_yrs",
    },
    analysis: {
      regression: {
        dependent_variable: "price",
      },
    },
  },
});
response = client.ml.preview_data_frame_analytics(
  body: {
    "config": {
      "source": {
        "index": "houses_sold_last_10_yrs"
      },
      "analysis": {
        "regression": {
          "dependent_variable": "price"
        }
      }
    }
  }
)
$resp = $client->ml()->previewDataFrameAnalytics([
    "body" => [
        "config" => [
            "source" => [
                "index" => "houses_sold_last_10_yrs",
            ],
            "analysis" => [
                "regression" => [
                    "dependent_variable" => "price",
                ],
            ],
        ],
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"config":{"source":{"index":"houses_sold_last_10_yrs"},"analysis":{"regression":{"dependent_variable":"price"}}}}' "$ELASTICSEARCH_URL/_ml/data_frame/analytics/_preview"
client.ml().previewDataFrameAnalytics(p -> p
    .config(c -> c
        .source(s -> s
            .index("houses_sold_last_10_yrs")
        )
        .analysis(a -> a
            .regression(r -> r
                .dependentVariable("price")
            )
        )
    )
);
Request example
An example body for a `POST _ml/data_frame/analytics/_preview` request.
{
  "config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}