Understanding Sustax Climate Data / Key Guidelines and Best Practices

Key Guidelines and Best Practices

Last update: September 17, 2025

In:

Interactive Index

  1. Understanding the Spatial Resolution
  2. Understanding Output Data
  3. Best Practices for Users
  4. Questions?

Understanding the Spatial Resolution

The climate data in Sustax is provided at a geospatial resolution of 0.25° x 0.25° grid cells. This means that the value provided for any climate variable (e.g., temperature, precipitation) for a specific grid cell represents an average (tas, hurs), an accumulation (pr) or a maximum (sfcWindmax) over that entire 0.25 degree area. This gridded approach is a foundational aspect of the entire modern climate science [25], used in global climate models to simulate large-scale climate systems. Note that, while it may seem that finer grids would always be “better”, research shows that the relationship between spatial resolution and predictive accuracy is complex. Higher resolution does not guarantee higher accuracy, as model performance is also influenced by other factors like the parameterization of physical processes and the chaotic nature of the climate system itself [26] [27].

  • Not Point-Specific: The data does not represent the exact conditions at a single, precise point but rather the general conditions within that grid cell.
  • Sub-Grid Variability: Local variations can occur due to microclimates, local topography, land use (urban vs. rural), or the localised nature of some weather phenomena. For instance, a high precipitation value for a grid cell might indicate intense rainfall in one part of the cell, while other parts experienced little to no rain.

Sustax data is excellent for understanding climate trends, climate change detection and quantification, and modelling and comparing risks across different areas. For users requiring enhanced granularity and / or custom model integration, we provide specialized data:

  1. Free access to Sustax’s invariant variables (geopotential height, soil type, vegetation cover, etc.), which offer static context for advanced downscaling or integration
  2. Use Sustax’s SSPs historical period to interpolate it with your own local observations (we recommend bias correcting by linear scaling [28] in that particular case)

Understanding Output Data

Sustax output data’s format is standardised and homogenised (see a visualization of the output CSV file, [29]). Whether you’re requesting climate projections for a single location in Mozambique, multiple sites across Europe, or a global analysis spanning continents, every delivered CSV file follows the same consistent structure and formatting conventions. This standardisation means you can develop universal post-processing worflows that work seamlessly with any Sustax data export, regardless of geographic location or dataset complexity. The standardised format includes: 

  • 1 CSV – 1 POI (Point Of Interest), it means a geographical point (latitude, longitude in WGS84 coordinates) 
  • Consistent header structure with extensive metadata about your request 
  • The accuracy metrics requested
  • Uniform date formatting (YYYY/MM for monthy data or YYYY/MM/DD for daily) 
  • Standardized variable naming across all climate parameters 
  • Clear units specification for all variables

This consistency eliminates the need to develop location-specific data processing workflows, dramatically reducing the time and effort required to work with climate data from different regions. Whether you’re analyzing temperature trends in Africa or wind patterns in Europe, your analysis code and procedures remain the same, making Sustax data truly plug-and-play for your climate intelligence needs.

Best Practices for Users

Developing Climate Perils and New Variables

The value of Sustax extends beyond its individual variables and pre-calculated indices. By combining the platform’s foundational climate data, users can generate a wide range of customized variables and advanced climate hazard indicators. Some examples include:

  • Fire Weather Index (FWI) [30]: The Canadian Forest Fire Weather Index System is a globally recognised standard for estimating wildfire risk. Using the Sustax variables Temperature (tas), Relative Humidity (hurs), Wind Speed (sfcWindmax), and 24-hour Precipitation (pr).
  • Heat Stress (THI) [31]: A widely used index that combines air temperature (tas) and relative humidity (hurs) with the Wet Bulb temperature to quantify the level of heat stress on humans and animals.
  • Pluvial Flood Hazard Index (PFI) [32]: This is a surface water index designed to identify areas at high risk of surface flooding. It combines precipitation data with land-surface characteristics. From Sustax you can use Daily Total Precipitation (pr) or Monthly Maximum 1-day Precipitation (RX1day) to identify rainfall intensity combined with the free-of-cost static variables: soil type (slt), vegetation cover (cvh, cvl), and geopotential height (z) to assess the land’s runoff potential.
  • Evapotranspiration (ET) [33]: Using FAO’s Penman-Monteith equation which includes monthly temperature (min, max, mean), total precipitation (pr), relative humidity (hurs) solar irradiance (soon available, rsds), wind (derived from sfcWindmax), geopotential height (z), vegetation cover (cvh & cvl). The Soil Heat Flux is assumed to be 0.

Managing Uncertainty

Don’t rely solely on a single “best estimate”, embrace uncertainty, consider the provided accuracy metrics and the model spread for specific projections, and analyse data across multiple relevant SSP-RCP scenarios to understand the potential range of outcomes and associated uncertainties. 

The Sustax variable Model Spread quantifies the original uncertainty within the ensemble used to generate each variable. However, because some SSP-RCP scenarios were generated with more simulations than others, directly comparing the absolute spread values between scenarios can be misleading (see Specifics of the Uncertainty section). For the most accurate interpretation, use the Model Spread to:

  1. Track the Evolution of Uncertainty: Analyse how the spread for a single SSP-RCP scenario changes over time. An increasing spread signals growing divergence among the original ensembles used for this SS-RCP scenario
  2. Compare Uncertainty Trends / Slopes: Evaluate the rate of change (or slope) of the Model Spread’s time series between different SSP-RCP scenarios. This allows you to compare how quickly uncertainty is expected to grow in one plausible scenario versus another, in a way that, the number of simulations becomes much less relevant.

Managing Sustax Scenarios (i.e. SSPs and ERA5)

Use the seven of scenarios to conduct sensitivity analyses for climate impacts. The scenarios are another source of uncertainty to consider and; while the narratives behind the SSP provide a strong framework, it’s recognised within climate science that projecting certain variables, like precipitation and gusts of wind, carries higher uncertainty than for temperature (see the IPCC’s ‘high confidence’ notation in temperature analysis but notation ‘medium confidence’ for precipitation and wind events [34]). We recommend using the full range of Sustax scenarios to benchmark all possibilities and build a resilient strategy that accounts for this spectrum of uncertainty.

Use ERA5 “scenario” for custom benchmarking. This allows you to calculate bespoke metrics relevant to your industry or compare model outputs against a trusted “ground-truth” dataset in a given period or season, building an even deeper level of confidence and understanding. ERA5 can also be used as a way to measure change against the SSP projections. Most importantly, ERA5 can be used to assess immediate physical risks (e.g.; the next 1 to 5 years)

Finally, you can download ERA5 from the official Copernicus API up to today and on to validate the accuracy of Sustax’s projections from 2023 and on.

The accuracy metrics

Combine different metrics for a holistic accuracy assessment, as no single metric tells the whole story. Use the suite of accuracy metrics in combination for a more nuanced assessment of model behavior. For example:

  • Look Beyond the Average: A low Mean Bias Error (MBE) is good, but check it against the Mean Absolute Error (MAE). A low MBE with a high MAE can indicate that large positive and negative errors are cancelling each other out.
  • Assess Trends and Distributions: A high Pearson R shows the model correctly captures trends over time, but the MBE will tell you if it is systematically over- or under-estimating the absolute values. Use the Energy and Wasserstein distances to confirm the model is realistically capturing the full distribution of outcomes, including the likelihood of extremes.

Use metrics as a “relative metrics”, since Sustax’s accuracy metrics are designed to be comparative tools, helping you make informed decisions about which SSP-RCP scenarios best suit your analysis. The metrics have been estimated during the training period (40 years). Given the nature of Sustax metrics, you should:

  • Use Metrics for Relative Comparison: The primary role of these metrics is to allow you to compare the historical performance of different post-processed SSP-RCP scenarios.
  • Validate for Definitive Accuracy: While historical performance is an indicator, another measure of accuracy is how projections perform in the predictive period. Use Sustax’s projections from 2023 onwards and validate them against the corresponding ERA5 data as it becomes available.

Further refining: Empirical Downscaling

Empirical downscaling is a commonly used technique, yet we advise being very cautious when using such technique. First of all, empirical downscaling can make assumptions that usually are not right [35], but most importantly, the uncertainty of the original gridcell can be simply propagated (i.e. not solved) or even worse, amplified [36]. It is a common misconception that higher spatial resolution always equals higher accuracy. In fact, studies have shown that a coarser dataset (like ERA5 at ~31 km, the baseline used in Sustax) can exhibit higher skill for certain variables than its higher-resolution counterpart (like ERA5-Land at ~9 km) [37].

To help you move beyond the limitations of simple empirical methods, Sustax provides a suite of static geospatial variables, including soil type, geopotential height, vegetation cover, and vegetation height. This data can be used for further assessment of climate risks or for locally-adapted techniques of empirical downscaling. As example:

  • Geopotential Height at surface: Adjust orographic data into your high-resolution vulnerability analysis.
  • Soil Type: Incorporate soil characteristics to refine hydrological models, such as runoff simulations or drought risk evaluations.
  • Vegetation Types and density: Leverage vegetation data to enhance ecosystem impact studies, like assessing carbon sequestration or wildfire susceptibility under changing climates.

As an example, you can supplement Sustax data with more detailed local studies or data such as local publications and reports, previous experiences, vulnerability assessments or data from weather stations, rain gauges and satellites. For instance:

  • In agriculture, overlaying Sentinel-2 data on crop health could refine Sustax’s drought projections, highlighting field-level vulnerabilities
  • For city management, integrating a municipality’s asset maps or energy consumption maps (you can get these from Landasat’s 9 TIR payload) with Sustax’s heatwave projections could assess cooling demands at a neighbourhood scale.

Further refining: Bias Correction and Supervised Learning

We encourage you to enhance Sustax’s SSP-RCP projections by integrating your own local data. If you have access to local weather observations (e.g., from on-site stations) or high-quality satellite-derived data (such as CHIRPS for precipitation SAFDAT for temperature), you can use this information to further refine our Sustax’s model outputs.

To perform this calibration, you will need to download the complete time series for the relevant SSP-RCP scenario(s), covering both the historical and future periods. A robust approach for splitting your data is as follows:

  • Training and Validation Period (e.g., ~2000–2022): Use this period to establish the scaling factor or train the algorithm
  • Cross-Validation/Testing Period (e.g., 2022–2025): Infer the trained algorithm to this independent period to test its robustness
  • Prediction Period (2025 onwards): Apply the algorithm to the future projections to generate your refined local data.

By offering both future projections and a historical foundation for each SSP scenario, Sustax supports an enormous range of analytical needs, including those requiring further localised refinement.

Further refining: Large Language Models and Sustax

The data exported from Sustax is designed to be machine-readable and rich in context, which is ideal for advanced analysis. Interacting with this data using Large Language Models (LLMs) can unlock powerful new insights, but it requires a strategic approach.

To get the best results when using Sustax CSV files with an LLM, you should not just upload the file directly. The complex header requires a little preparation to ensure the model understands the context

  1. Use Sustax official Python interpreter: Before using your prompting a Sustax CSV into the LLM, ensure you prompt it with the official Python code provided in this Documentation Hub. This step helps the LLM correctly interpret the CSV structure. If prompted in this order, Sustax data can be seamlessly interpreted the LLM. For optimal results, Anthropic’s Claude is recommended based on our experience.
  2. Provide Context in Your Prompt: The most effective way to use this data is to “teach” the LLM about your specific file. Start your prompt by providing the key metadata you just extracted. Example Prompt: “I am analyzing climate data for Ofuogbene, Nigeria (Lat: 5.2660, Lon: 5.4492). The attached data includes columns like ‘AvMT’ (Average Temperature) and ‘RT’ (Monthly Accumulated Precipitation Total). Please perform the following task: [Your question here].”
  3. Be Specific with Your Questions: An LLM works best with clear, targeted instructions. Instead of asking it to “analyze the file,” ask it to perform a specific task.
    • Good: “Using the provided data, create a summary table showing the average ‘AvMT’ for the ssp585 scenario for the decades 2030-2039 and 2040-2049.”
    • Good: “Please generate Python code to plot the ‘RT’ and ‘AvMT’ for the era5 scenario from 1980 to 2020.”
    • Less Effective: “What does this data say?”

Other Pro Tips

Sustax provides climate projections (long-term statistical likelihoods of climate conditions) and should not be confused with short-term weather forecasts (predicting specific weather events on specific days in the near future). The data is designed for assessing long-term climate change impacts, risks, and adaptation. Ensure its application aligns with this purpose. 

While Sustax provides robust climate data, incorporating local knowledge and expertise (e.g., local observations, specific infrastructure vulnerabilities, community needs) is often crucial for the most effective adaptation planning. 

Finally, we recommend you to consult Sustax documentation for specific definitions of (daily) variables, (monthly) climate indices, scenarios, and metrics. For complex applications or if you are unsure how to interpret specific data, consider consulting with climate adaptation specialists from Geoskop or the Sustax support team.

Questions?

If you have further questions about using Sustax data, please consult our FAQ or Contact Sustax Support at info@geoskop.tech