Interactive Index
- The CSV Output File
- Import Data with M. Excel
- Import Data with Python
- Import Data with R
- Importing and Interpreting with Large Language Models
- Sustax User’s API
The CSV Output File
Each Sustax request will return 1 CSV file per gridcell or POI (Point of Interest), encoded in UTF-8, with fields separated by commas. If the user ask for monthly data and daily data in the same request, it gets 2 CSV files. Each CSV consists of three main structures:

1. Metadata (free of cost)
Includes the CSV creation date, a precise location in geographical coordinates (WGS84), an approximate Open Street Map location, the climate variables requested (short name, long name and units), the accuracy metrics requested (short name and long name) and the scenarios requested (SSPs or ERA5). Each requests also includes free of cost a set of static variables representing the interpreted geophysical features of the gridcell, which includes:
- CVH. Estimated high cover vegetation fraction (%)
- CVL. Estimated low cover vegetation fraction (%)
- TVH. Type of high vegetation (~)
- TVL. Type of low vegetation (~)
- SLT. Soil type (~)
- Z. Estimated geopotential height, (m)
- SDOR. Standard deviation of sub-gridscale orography, (~)
- SLOR. Slope of sub-gridscale orography, (~)
2. Accuracy metrics (payload)
The predictive accuracy metrics have a key characteristic: they are timeless, meaning there are 5 metrics with a single value for each climate variable and each SSP-RCP scenario.
3. Climate Data (payload)
The climate data consist of time series, which can be at daily or monthly resolution. For each climate scenario and variable, a time series is generated that can contain up to 102 years of data. Since users can request data with monthly or daily time stamps. Two separate CSVs are generated if the user’s request includes datasets with both temporal resolutions.
If the user requests an ROI (Region of Interest) instead of a POI (Point of Interest), it will receive multiple CSVs—one for each POI contained within the requested ROI. In any case, the requested CSV(s) are delivered in a compressed ZIP file.
Import Data with M. Excel
As Sustax’s CSV encoding uses UTF-8 with BOM, Excel usually detects it automatically.
Desktop Excel
- Open Excel (desktop version).
- Go to Data → Get Data → From Text/CSV.
- Select your CSV file and click Import.
- In the Import Wizard:
- File Origin: Choose 65001: Unicode (UTF-8).
- Delimiter: Select Comma (,).
- Click Load (or Transform Data if you want to edit in Power Query).
- Your data will appear in separate columns.
Online Excel
- Upload the CSV to OneDrive.
- Open it in Excel Online.
- If Excel detects the delimiter automatically, it will split columns.
- If not, select the entire Text Column:
- Click on Data tab
- On Data Tools, click on Split Text to Columns, consider the delimiter is a comma (,).
If you see strange characters in the metadata (e.g., ñ, accents, foreign ANSI characters), ensure you selected UTF-8 in the import system. If your Excel is not formatted with dots ‘.’ as decimal separator, the data will be wrongly displayed (e.g. Spanish Excels and the ‘,’ as decimal separator)
Import Data with Python
It requires Python 3.6 or above with two external libraries:
- Numpy
- Pandas
The script contains a single function with all the imports necessary inside.
def load_sustax_file(csv_stx, return_pandas_df = True, return_metadata = False):
"""
Pull Sustax CSV data to Python's IDE as numpy arrays or as panda dataframes
Parameters:
----------
- csv_stx: Str. or Path obj. Full file name pointing to Sustax CSV
- return_pandas_df: Bool. Use bool(True) to get outputs as Pandas obj.
- return_metadata: Bool. Use bool(True) to get CSV metadata
Returns:
-------
Returns two, three or four objects depending on `return_pandas_df` and
`return_metadata` parameters:
1. Climate data: All `Data requested` field from the CSV file
2. Accuracy metrics: All `Metrics requested` field from the CSV file
3. Climate data Time Stamps: All timesteps in `Data requested` field
4. Metadata: Dictionary object containg all the CSV metadata
The climate data and accuracy metrics can be returned as dictionaries
with arrays or as Pandas dataframe
If `return_pandas_df` is requested, Time Stamps are merged with the
climate data automatically
External packages required:
--------------------------
- Numpy
- Pandas
"""
# Imports "inside" for compactness. We encourage to place all imports
# at the beginning of your module / script file
import csv
import decimal
import numpy as np
import pandas as pd
from datetime import datetime
def _isfloat(string):
try:
decimal.Decimal(string)
return True
except decimal.InvalidOperation:
return False
with open(csv_stx, 'r', encoding = None) as fobj:
data = csv.reader(fobj, delimiter = ',')
data = [i for i in data]
if return_metadata:
mt = {}
for r in data:
mt["creation_date"] = data[1][1]
if any(['longitude' in h.lower() for h in r]):
mt["lon"] = [float(rr) for rr in r if _isfloat(rr)][0]
if any(['latitude' in h.lower() for h in r]):
mt["lat"] = [float(rr) for rr in r if _isfloat(rr)][0]
if any(['soil variables' in h.lower() for h in r]):
idx_loc = [c for c in range(len(data)) if 'Soil variables:' in data[c]][0]
mt["soil_data"] = {}
for c in range(1, len(r)):
if r[c]:
mt["soil_data"].update({data[idx_loc][c]:data[idx_loc+1][c]})
# Check if daily or monthly dataset, get the whole string
dts_type = [d for d in data if (len(d) > 0) and (d[0] == 'Dataset:')][0][1]
# Get climate data
# Four rows below to get the metric values
idx_data = [c for c in range(len(data))\
if 'Data requested:' in data[c]][0] + 4
all_data = data[idx_data:]
all_data_vars = data[idx_data-3]
all_data_scenarios = data[idx_data-2]
if return_metadata:
mt["var_us"] = [r for r in data[idx_data-1] if r]
mt["var_scs"] = [r for r in data[idx_data-2] if r and (r.lower() != 'time')]
mt["var_nm"] = [r.split('-')[0] for r in data[idx_data-3] if r]
try:
# In this case returns the unique long name
idx_long = [c for c in range(len(data)) if 'Climate variables long name:' in data[c]][0]
mt["var_l_nm"] = data[idx_long][1].split(',')
except (IndexError, TypeError):
pass
# Get metrics data
if [d for d in data if (len(d) > 0) and\
('Metric variables:' in d)][0][1] != '':
# Two rows below to get the metric values
# Five row up to get to the last metric
idx_metrics = [c for c in range(len(data))\
if 'Metrics requested:' in data[c]][0] + 2
all_metrics = data[idx_metrics:idx_data - 5]
else:
all_metrics = {}
dt_metrics = {}
for r in all_metrics:
metric_nm = r[0].split(' - ')[0]
var_nm = r[0].split(' - ')[1]
if var_nm not in dt_metrics:
dt_metrics.update({var_nm:{}})
dt_metrics[var_nm].update({metric_nm:{}})
for c in range(1, len(r)):
if _isfloat(r[c]):
dt_metrics[var_nm][metric_nm][all_data_scenarios[c]] = float(r[c])
# Get climate payload data
dt = {}
for c in range(len(all_data_vars)):
if all_data_vars[c] != '':
dt.setdefault(all_data_vars[c],{}).update({all_data_scenarios[c]:[]})
# Load all data to local variables
time = []
for r in all_data:
for c in range(len(r)):
if ("SSP" in all_data_scenarios[c]) or ("ERA" in all_data_scenarios[c]):
dt[all_data_vars[c]][all_data_scenarios[c]].append(float(r[c]) if r[c]!='' else float('NaN'))
time.append(datetime.strptime(r[0], '%Y/%m' if 'monthly' in dts_type.lower()\
else "%Y/%m/%d"))
# Payload data to array
time = np.array(time, dtype = f'datetime64[{"M" if "monthly" in dts_type.lower() else "D"}]')
dt = {k: {s: np.asarray(dt[k][s]) for s in dt[k]} for k in dt}
if return_pandas_df:
all_dfs = [pd.Series({d: v for d, v in zip(time, dt[var][s])}).to_frame(name = f"{var} [{s}]")\
for var in dt for s in dt[var]]
df_vals = all_dfs[0].join(all_dfs[1:])
# Return empty DF if no metrics
all_dfs = [pd.Series({v: dt_metrics[var][s][v] for v in dt_metrics[var][s]}).to_frame(name = f"{var} [{s}]")\
for var in dt_metrics for s in dt_metrics[var]]
# Avoid empty metrics if csv does not contain them
df_metrics = all_dfs[0].join(all_dfs[1:]) if all_dfs else pd.DataFrame()
if return_metadata:
return df_vals, df_metrics, mt
else:
return df_vals, df_metrics
else:
if return_metadata:
return dt, dt_metrics, time, mt
else:
return dt, dt_metrics, time
Import Data with R
The code below requires R 4.4.1. or above:
load_sustax_file <- function(csv_stx, return_metadata = FALSE) {
#' Load Sustax CSV data into R with multi-level column names
#'
#' @param csv_stx Character string. Full file path to Sustax CSV
#' @param return_metadata Logical. TRUE to include metadata (default: FALSE)
#' @return List with climate_data, accuracy_metrics, and metadata
#' Both data frames have multi-level column names
is_float <- function(x) {
!is.na(suppressWarnings(as.numeric(x))) && nchar(trimws(x)) > 0
}
con <- file(csv_stx, "r", encoding = "UTF-8")
lines <- readLines(con, warn = FALSE)
close(con)
data <- lapply(lines, function(line) {
con_line <- textConnection(line)
parsed <- tryCatch({
as.character(read.csv(con_line, header = FALSE, stringsAsFactors = FALSE,
check.names = FALSE, na.strings = NULL)[1, ])
}, error = function(e) strsplit(line, ",")[[1]])
close(con_line)
return(parsed)
})
# Extract metadata (if requested)
if (return_metadata) {
mt <- list()
for (i in seq_along(data)) {
r <- data[[i]]
if (i == 2 && length(r) >= 2) mt$creation_date <- r[2]
if (any(grepl("longitude", r, ignore.case = TRUE))) {
nums <- suppressWarnings(as.numeric(r))
nums <- nums[!is.na(nums)]
if (length(nums) > 0) mt$lon <- nums[1]
}
if (any(grepl("latitude", r, ignore.case = TRUE))) {
nums <- suppressWarnings(as.numeric(r))
nums <- nums[!is.na(nums)]
if (length(nums) > 0) mt$lat <- nums[1]
}
if (any(grepl("soil variables", r, ignore.case = TRUE))) {
idx_loc <- which(sapply(data, function(x) any(grepl("^Soil variables:", x))))[1]
if (!is.na(idx_loc) && idx_loc < length(data)) {
soil_header <- data[[idx_loc]]
soil_values <- data[[idx_loc + 1]]
mt$soil_data <- list()
for (c in 2:length(soil_header)) {
if (c <= length(soil_values) &&
nchar(trimws(soil_header[c])) > 0 &&
nchar(trimws(soil_values[c])) > 0) {
mt$soil_data[[soil_header[c]]] <- soil_values[c]
}
}
}
}
}
}
dts_type <- ""
for (d in data) {
if (length(d) > 0 && d[1] == "Dataset:") {
dts_type <- d[2]
break
}
}
idx_data <- which(sapply(data, function(x) any(grepl("^Data requested:", x))))[1] + 4
all_data_vars <- data[[idx_data - 3]]
all_data_scenarios <- data[[idx_data - 2]]
all_data_units <- data[[idx_data - 1]]
# Store variable metadata
if (return_metadata) {
mt$var_us <- all_data_units[nchar(trimws(all_data_units)) > 0]
mt$var_scs <- all_data_scenarios[nchar(trimws(all_data_scenarios)) > 0 &
tolower(all_data_scenarios) != "time"]
mt$var_nm <- sapply(all_data_vars[nchar(trimws(all_data_vars)) > 0], function(x) {
if (grepl("-", x, fixed = TRUE)) strsplit(x, "-", fixed = TRUE)[[1]][1] else x
}, USE.NAMES = FALSE)
idx_long <- which(sapply(data, function(x)
length(x) > 0 && any(grepl("Climate variables long name:", x))))[1]
if (!is.na(idx_long)) {
long_str <- paste(data[[idx_long]][-1], collapse = ",")
long_str <- gsub('"', '', long_str)
mt$var_l_nm <- strsplit(long_str, ",")[[1]]
}
}
# Check for metrics
metric_var_idx <- which(sapply(data, function(x)
length(x) > 0 && any(grepl("^Metric variables:", x))))[1]
has_metrics <- FALSE
if (!is.na(metric_var_idx)) {
metric_check <- data[[metric_var_idx]]
if (length(metric_check) > 1 && nchar(trimws(metric_check[2])) > 0) {
has_metrics <- TRUE
}
}
# Parse METRICS with multi-level headers
dt_metrics <- list()
metrics_col_info <- list()
if (has_metrics) {
idx_metrics <- which(sapply(data, function(x)
any(grepl("^Metrics requested:", x))))[1] + 2
if (!is.na(idx_metrics) && idx_metrics < idx_data - 5) {
all_metrics <- data[idx_metrics:(idx_data - 5)]
for (r in all_metrics) {
if (length(r) > 0 && nchar(trimws(r[1])) > 0 && grepl(" - ", r[1], fixed = TRUE)) {
parts <- strsplit(r[1], " - ", fixed = TRUE)[[1]]
metric_nm <- parts[1]
var_nm <- parts[2]
key <- paste0(var_nm, "_", metric_nm)
dt_metrics[[key]] <- list()
metrics_col_info[[key]] <- list(variable = var_nm, metric = metric_nm)
for (c in 2:length(r)) {
if (c <= length(all_data_scenarios) && is_float(r[c])) {
scenario <- all_data_scenarios[c]
if (nchar(trimws(scenario)) > 0) {
dt_metrics[[key]][[scenario]] <- as.numeric(r[c])
}
}
}
}
}
}
}
# Parse CLIMATE data
all_data <- data[idx_data:length(data)]
dt <- list()
col_info <- list()
for (c in seq_along(all_data_vars)) {
var_name <- all_data_vars[c]
scenario <- all_data_scenarios[c]
if (nchar(trimws(var_name)) > 0 &&
nchar(trimws(scenario)) > 0 &&
(grepl("SSP", scenario) || grepl("ERA", scenario))) {
key <- paste0(var_name, "_", scenario)
dt[[key]] <- numeric(0)
col_info[[key]] <- list(variable = var_name, scenario = scenario)
}
}
# Extract timestamps and values
time <- character(0)
is_monthly <- grepl("monthly", dts_type, ignore.case = TRUE)
for (r in all_data) {
if (length(r) > 0 && nchar(trimws(r[1])) > 0) {
time <- c(time, r[1])
for (c in 2:length(r)) {
if (c <= length(all_data_scenarios) && c <= length(all_data_vars)) {
var_name <- all_data_vars[c]
scenario <- all_data_scenarios[c]
if (nchar(trimws(var_name)) > 0 &&
nchar(trimws(scenario)) > 0 &&
(grepl("SSP", scenario) || grepl("ERA", scenario))) {
key <- paste0(var_name, "_", scenario)
value <- ifelse(nchar(trimws(r[c])) > 0,
suppressWarnings(as.numeric(r[c])),
NA_real_)
if (key %in% names(dt)) {
dt[[key]] <- c(dt[[key]], value)
}
}
}
}
}
}
# Convert timestamps
if (is_monthly) {
time_parsed <- as.Date(paste0(time, "/01"), format = "%Y/%m/%d")
} else {
time_parsed <- as.Date(time, format = "%Y/%m/%d")
}
# Build CLIMATE dataframe with multi-level headers
if (length(dt) > 0) {
df_list <- list(Time = time_parsed)
for (key in names(dt)) {
full_var <- col_info[[key]]$variable
if (grepl("-", full_var, fixed = TRUE)) {
short_var <- strsplit(full_var, "-", fixed = TRUE)[[1]][1]
} else {
short_var <- full_var
}
scenario <- col_info[[key]]$scenario
col_name <- paste0(short_var, "\n", scenario)
df_list[[col_name]] <- dt[[key]]
}
df_vals <- as.data.frame(df_list, stringsAsFactors = FALSE)
var_names <- c("Time", sapply(names(dt), function(key) {
full_var <- col_info[[key]]$variable
if (grepl("-", full_var, fixed = TRUE)) {
strsplit(full_var, "-", fixed = TRUE)[[1]][1]
} else {
full_var
}
}))
scenario_names <- c("", sapply(names(dt), function(key) col_info[[key]]$scenario))
attr(df_vals, "variable_names") <- var_names
attr(df_vals, "scenario_names") <- scenario_names
} else {
df_vals <- data.frame(Time = time_parsed)
}
# Build METRICS dataframe with multi-level headers
if (length(dt_metrics) > 0) {
all_scenarios <- unique(unlist(lapply(dt_metrics, names)))
metrics_df_list <- list()
for (key in names(dt_metrics)) {
var_nm <- metrics_col_info[[key]]$variable
metric_nm <- metrics_col_info[[key]]$metric
col_name <- paste0(var_nm, "\n", metric_nm)
values <- sapply(all_scenarios, function(sc) {
if (sc %in% names(dt_metrics[[key]])) {
dt_metrics[[key]][[sc]]
} else {
NA_real_
}
})
metrics_df_list[[col_name]] <- values
}
df_metrics <- as.data.frame(metrics_df_list, stringsAsFactors = FALSE)
rownames(df_metrics) <- all_scenarios
var_names_metrics <- sapply(names(dt_metrics), function(key) metrics_col_info[[key]]$variable)
metric_names_metrics <- sapply(names(dt_metrics), function(key) metrics_col_info[[key]]$metric)
attr(df_metrics, "variable_names") <- var_names_metrics
attr(df_metrics, "metric_names") <- metric_names_metrics
} else {
df_metrics <- data.frame()
}
if (return_metadata) {
return(list(climate_data = df_vals, accuracy_metrics = df_metrics, metadata = mt))
} else {
return(list(climate_data = df_vals, accuracy_metrics = df_metrics))
}
}
Importing and Interpreting with Large Language Models
Sustax climate data is delivered in structured CSV files containing metadata, accuracy metrics, and time-series data. Effectively interpreting this data with large language models (LLMs) relies on not only importing but also understanding the structure and content.
LLMs excel at automating the interpretation of diverse data sections by generating parsing and cleaning code and providing initial analyses or summaries. This greatly reduces manual data preparation and error risks, making climate risk assessment more efficient.
To facilitate meaningful interpretation:
- Provide the LLM with context by granting access to the Sustax Documentation Hub before processing data.
- There are two alternative to use so that the LLM can interpret the CSV file:
- For textual context: Split CSV files into manageable parts: (1) metadata as plain text to preserve descriptive context (i.e., metadata.txt), and (2) accuracy metrics and (3) time-series sections as tabular CSVs (i.e., accuracy_metrics.csv, climate_timeseries_data.csv) to maintain structure.
- For analytical context: Copy the official loader function from Sustax's Documentation Hub (right above). You can paste the copied function directly into the LLM prompt as text, the Generative AI system will interpret and gain immediate access to clean data structures for aggregation, plotting, or model evaluation. This way you will not need to split the original Sustax CSV.
- Test the understanding of the data by the LLM by asking questions like "Where is this data from?", "What period of data is included in the time series?"
- Use summarised or monthly data extracts over more extense daily data when possible, as LLMs process text via prompts or integrated tools rather than directly ingesting files.
To optimise the request of information, you better use monthly data instead of daily as LLM models don't directly "ingest" files—they process text via prompts, APIs, or integrated tools (e.g., Python code execution).

Best practices
- Prompt Engineering: Always include context, e.g., "Here is accuracy_metrics.csv content: [paste or upload]. Generate Python code to load and visualize it."
- Tools Integration: Use LLMs with file-handling plugins (e.g., ChatGPT Advanced Data Analysis) to upload CSVs directly.
- Testing: Validate splits ensure no data loss (e.g., row counts match). If using APIs, base64-encode files for transmission.
- Security Note: Avoid sharing sensitive data; anonymize if needed.
(Coming soon) Sustax User's API
The Sustax User's API (SUA) is a programmatic interface designed for accessing and downloading climate data from the Sustax platform by command line. It enables users to retrieve any Sustax dataset using point-of-interest (POI) coordinates in the WGS84 latitude-longitude system. This API is particularly useful for developers, researchers, and organisations needing automated, bulk access to Sustax's models for applications like risk assessment, adaptation planning, or integration into custom tools.
The API operates on a token-based authentication system and focuses on POI-based queries. Note that the API automates data requests without price validation, generating downloadable URLs for the requested datasets.
Step 1. Authentication: Obtain a User Token
This token authenticates subsequent requests and ensures secure access to your account's data entitlements.
- Input: Provide your username and password in a POST request
- Output: If successful, the API returns a response code of "200" along with a user_token (i.e., { "response": "200", "user_token": "your_token_here" }).
- Endpoint: https://app.sustax.earth/api
Step 2. Submit Data Request
Before making a data request, users must obtain a dataset ID (e.g., for Shared Socioeconomic Pathways like SSP1-1.9 or SSP5-8.5). These Sustax codes are available right below (Sustax’s datasets tables).
No manual "pre-request" validation is needed; the system proceeds automatically to generate a request. Be careful with the pricing!
- Input: Provide the following parameters
- usr_tkn: Your authenticated user token from Step 1.
- disclaimer_acceptance: Pass 1 to accept the disclaimer
- lat: Latitude coordinate (WGS84 format, e.g., -19.828707 for a location in Mozambique).
- lon: Longitude coordinate (WGS84 format, e.g., 34.841782).
- first_year: Starting year for the data range (e.g., 2000).
- last_year: Ending year for the data range (e.g., 2080).
- Sustax code: The Sustax dataset ID(s) (e.g., for a specific SSP-RCP combination(s) and variable(s)).
- Output: If successful, returns a response code of "200" with a request_url (a direct link to download the data, such as a CSV file containing climate variables, metrics, and projections) and a request_message (any status notes).
- Endpoint: https://app.sustax.earth/api (GET request)
This generates a downloadable dataset tailored to the specified location and time period, including monthly or daily climate projections, accuracy metrics (e.g., Mean Bias Error, Pearson Correlation), and supporting details like units and timestamps.
The data is exported in Sustax CSV standard format (as seen in sample exports), containing headers for climate variables (e.g., Maximum Mean Temperature, Consecutive No-Rain Days), units, scenarios, and metrics. It includes timestamps, geographical details.