Skip to main content This browser is no longer supported. Show
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support. Set up AutoML to train a time-series forecasting model with Python
In this articleAPPLIES TO: Python SDK azureml v1In this article, you learn how to set up AutoML training for time-series forecasting models with Azure Machine Learning automated ML in the Azure Machine Learning Python SDK. To do so, you:
For a low code experience, see the Tutorial: Forecast demand with automated machine learning for a time-series forecasting example using automated ML in the Azure Machine Learning studio. Unlike classical time series methods, in automated ML, past time-series values are "pivoted" to become additional dimensions for the regressor together with other predictors. This approach incorporates multiple contextual variables and their relationship to one another during training. Since multiple factors can influence a forecast, this method aligns itself well with real world forecasting scenarios. For example, when forecasting sales, interactions of historical trends, exchange rate, and price all jointly drive the sales outcome. PrerequisitesFor this article you need,
Training and validation dataThe most important difference between a forecasting regression task type and regression task type within automated ML is including a feature in your training data that represents a valid time series. A regular time series has a well-defined and consistent frequency and has a value at every sample point in a continuous time span. Important When training a model for forecasting future values, ensure all the features used in training can be used when running predictions for your intended horizon. For example, when creating a demand forecast, including a feature for current stock price could massively increase training accuracy. However, if you intend to forecast with a long horizon, you may not be able to accurately predict future stock values corresponding to future time-series points, and model accuracy could suffer. You can specify
separate training data and validation data directly in the For time series forecasting, only Rolling Origin Cross Validation (ROCV) is used for validation by default. ROCV divides the series into training and validation data using an origin time point. Sliding the origin in time generates the cross-validation folds. This strategy preserves the time series data integrity and eliminates the risk of data leakage.
Pass your training and validation data as one dataset to the parameter APPLIES TO: Python SDK azureml v1
You can also bring your own validation data, learn more in Configure data splits and cross-validation in AutoML. Learn more about how AutoML applies cross validation to prevent over-fitting models. Configure experimentThe Supported modelsAutomated machine learning automatically tries different models and algorithms as part of the model creation and tuning process. As a user, there is no need for you to specify the algorithm. For forecasting experiments, both native time-series and deep learning models are part of the recommendation system. Tip Traditional regression models are also tested as part of the recommendation system for forecasting experiments. See a complete list of the supported models in the SDK reference documentation. Configuration settingsSimilar to a regression problem, you define standard training parameters like task type, number of iterations, training data, and number of cross-validations. Forecasting tasks require the Important Automatic time series identification is currently in public preview. This preview version is provided without a service-level agreement. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
The following code,
These
The amount of data required to successfully train a forecasting model with automated ML is influenced by the The following formula calculates the amount of historic data that what would be needed to construct time series features. Minimum historic data required: (2x An Featurization stepsIn every automated machine learning experiment, automatic scaling and normalization techniques are applied to your data by default. These techniques are types of featurization that help certain algorithms that are sensitive to features on different scales. Learn more about default featurization steps in Featurization in AutoML However, the following steps are performed only for
To view the full list of possible engineered features generated from time series data, see TimeIndexFeaturizer Class. Note Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically. Customize featurizationYou also have the option to customize your featurization settings to ensure that the data and features that are used to train your ML model result in relevant predictions. Supported customizations for
To customize featurizations with the SDK, specify Note The drop columns functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data cleansing, prior to consuming it in your automated ML experiment.
If you're using the Azure Machine Learning studio for your experiment, see how to customize featurization in the studio. Optional configurationsAdditional optional configurations are available for forecasting tasks, such as enabling deep learning and specifying a target rolling window aggregation. A complete list of additional parameters is available in the ForecastingParameters SDK reference documentation. Frequency & target data aggregationLeverage the
frequency, For highly irregular data or for varying business needs, users can optionally set their desired forecast frequency, Supported aggregation operations for target column values include:
Enable deep learningNote DNN support for forecasting in Automated Machine Learning is in preview and not supported for local runs or runs initiated in Databricks. You can also apply deep learning with deep neural networks, DNNs, to improve the scores of your model. Automated ML's deep learning allows for forecasting univariate and multivariate time series data. Deep learning models have three intrinsic capabilities:
To enable deep learning, set the
To enable DNN for an AutoML experiment created in the Azure Machine Learning studio, see the task type settings in the studio UI how-to. Target rolling window aggregationOften the best information a forecaster can have is the recent value of the target. Target rolling window aggregations allow you to add a rolling aggregation of data values as features. Generating and using these features as extra contextual data helps with the accuracy of the train model. For example, say you want to predict energy demand. You might want to add a rolling window feature of three days to account for thermal changes of heated spaces. In this example, create this window by setting The table shows resulting feature engineering that occurs when window aggregation is applied. Columns for minimum, maximum, and sum are generated on a sliding window of three based on the defined settings. Each row has a new calculated feature, in the case of the timestamp for September 8, 2017 4:00am the maximum, minimum, and sum values are calculated using the demand values for September 8, 2017 1:00AM - 3:00AM. This window of three shifts along to populate data for the remaining rows. View a Python code example applying the target rolling window aggregate feature. Short series handlingAutomated ML considers a time series a short series if there are not enough data points to conduct the train and validation phases of model development. The number of data points varies for each experiment, and depends on the max_horizon, the number of cross validation splits, and the length of the model lookback, that is the maximum of history that's needed to construct the time-series features. Automated ML offers
short series handling by default with the To enable short series handling, the
The following table summarizes the available settings for
Warning Padding may impact the accuracy of the resulting model, since we are introducing artificial data just to get past training without failures. If many of the series are short, then you may also see some impact in explainability results Non-stationary time series detection and handlingA time series whose moments (mean and variance) change over time is called a non-stationary. For example, time series that exhibit stochastic trends are non-stationary by nature. To visualize this, the below image plots a series that is generally trending upward. Now, compute and compare the mean (average) values for the first and the second half of the series. Are they the same? Here, the mean of the series in the first half of the plot is significantly smaller than in the second half. The fact that the mean of the series depends on the time interval one is looking at, is an example of the time-varying moments. Here, the mean of a series is the first moment.
Next, let's examine the image below, which plots the the original series in first differences
AutoML Machine learning models can not inherently deal with stochastic trends, or other well-known problems associated with non-stationary time series. As a result, their out of sample forecast accuracy will be "poor" if such trends are present. Automated ML automatically analyzes time series dataset to check whether it is stationary or not. When non-stationary time series are detected, they are automatically first differenced to mitigate the impact of non-stationary time series. Run the experimentWhen you have your
Forecasting with best modelUse the best model iteration to forecast values for data that wasn't used to train the model. Evaluating model accuracy with a rolling forecastBefore you put a model into production, you should evaluate its accuracy on a test set held out from the training data. A best practice procedure is a so-called rolling evaluation which rolls the trained forecaster forward in time over the test set, averaging error metrics over several prediction windows to obtain statistically robust estimates for some set of chosen metrics. Ideally, the test set for the evaluation is long relative to the model's forecast horizon. Estimates of forecasting error may otherwise be statistically noisy and, therefore, less reliable. For example, suppose you train a model on daily sales to predict demand up to two weeks (14 days) into the future. If there is sufficient historic data available, you might reserve the final several months to even a year of the data for the test set. The rolling evaluation begins by generating a 14-day-ahead forecast for the first two weeks of the test set. Then, the forecaster is advanced by some number of days into the test set and you generate another 14-day-ahead forecast from the new position. The process continues until you get to the end of the test set. To do a rolling evaluation, you call the
In the above sample, the step size for the rolling forecast is set to 1 which means that the forecaster is advanced 1 period, or 1 day in our demand prediction example, at each iteration. The total number of forecasts returned by Prediction into the futureThe
forecast_quantiles() function allows specifications of when predictions should
start, unlike the In the following example, you first replace all values in You can also use the
Often customers want to understand the predictions at a specific quantile of the distribution. For example, when the forecast is used to control inventory like grocery items or virtual machines for a cloud service. In such cases, the control point is usually something like "we want the item to be in stock and not run out 99% of the time". The following demonstrates how to specify which quantiles you'd like to see for your predictions, such as 50th or 95th percentile. If you don't specify a quantile, like in the aforementioned code example, then only the 50th percentile predictions are generated.
You can calculate model metrics like, root mean squared error (RMSE) or mean absolute percentage error (MAPE) to help you estimate the models performance. See the Evaluate section of the Bike share demand notebook for an example. After the overall model accuracy has been determined, the most realistic next step is to use the model to forecast unknown future values. Supply a data set in the same format as the test set
Repeat the necessary steps to load this future data to a dataframe and then run Note In-sample predictions are not
supported for forecasting with automated ML when Forecasting at scaleThere are scenarios where a single machine learning model is insufficient and multiple machine learning models are needed. For instance, predicting sales for each individual store for a brand, or tailoring an experience to individual users. Building a model for each instance can lead to improved results on many machine learning problems. Grouping is a concept in time series forecasting that allows time series to be combined to train an individual model per group. This approach can be particularly helpful if you have time series which require smoothing, filling or entities in the group that can benefit from history or trends from other entities. Many models and hierarchical time series forecasting are solutions powered by automated machine learning for these large scale forecasting scenarios. Many modelsThe Azure Machine Learning many models solution with automated machine learning allows users to train and manage millions of models in parallel. Many models The solution accelerator leverages Azure Machine Learning pipelines to train the model. Specifically, a Pipeline object and ParalleRunStep are used and require specific configuration parameters set through the ParallelRunConfig. The following diagram shows the workflow for the many models solution. The following code demonstrates the key parameters users need to set up their many models run. See the Many Models- Automated ML notebook for a many models forecasting example
Hierarchical time series forecastingIn most applications, customers have a need to understand their forecasts at a macro and micro level of the business; whether that be predicting sales of products at different geographic locations, or understanding the expected workforce demand for different organizations at a company. The ability to train a machine learning model to intelligently forecast on hierarchy data is essential. A hierarchical time series is a structure in which each of the unique series are arranged into a hierarchy based on dimensions such as, geography or product type. The following example shows data with unique attributes that form a hierarchy. Our hierarchy is defined by: the product type such as headphones or tablets, the product category which splits product types into accessories and devices, and the region the products are sold in. To further visualize this, the leaf levels of the hierarchy contain all the time series with unique combinations of attribute values. Each higher level in the hierarchy considers one less dimension for defining the time series and aggregates each set of child nodes from the lower level into a parent node. The hierarchical time series solution is built on top of the Many Models Solution and share a similar configuration setup. The following code demonstrates the key parameters to set up your hierarchical time series forecasting runs. See the Hierarchical time series- Automated ML notebook, for an end to end example.
Example notebooksSee the forecasting sample notebooks for detailed code examples of advanced forecasting configuration including:
Next steps
FeedbackSubmit and view feedback for Additional resourcesAdditional resourcesIn this articleWhich of the following lists only factors that would cause an increase in the supply of an item?Which of the following lists only factors that would cause an increase in the supply of an item? A rise in the price of a substitute-in-production; an increase in the price of a complement-in-production; an expectation that the price of the item will increase in the future.
Which of the following lists only the factors that would cause a decrease in the supply of an item?which of the following lists only the factors that would cause a decrease in the supply of an item? d) a rise in input prices; a decrease in the number of sellers in the market; a rise in the price of a substitute in production.
Which of the following is not one of the factors that influences the supply of a product?Answer and Explanation: The correct answer is d) The number of buyers. The factors that influence supply shift the supply curve from its original position. The number of buyers does not shift the supply curve.
Which of the following market changes would lead to a shift of the supply curve from old supply to new supply?Which of the following market changes would lead to a shift of the supply curve from Old supply to New supply? A rise in the price of a product that is a complement-in-production.
|