Esc
Start typing to search...

DataFrame Reshaping

Keel provides DataFrame.melt to reshape wide-format data into long format. This is useful when you have a dataset where repeated measurements are stored as separate columns and you want to stack them into rows.

Wide vs Long Format

Wide format keeps each variable or time point as its own column:

nrvar1_year1var1_year2var2_year1var2_year2
1102058
2304079

Long format stacks those columns into rows, with a new index column for the suffix:

nryearvar1var2
11105
12208
21307
22409

DataFrame.melt

DataFrame.melt takes five arguments:

  • id columns — column names to keep as-is (e.g. ["nr"])
  • prefixes — one prefix per output value column (e.g. ["var1_", "var2_"])
  • separator — the character between the prefix stem and the suffix (e.g. "_")
  • index name — name for the new column that holds the parsed suffix (e.g. "year")
-- melt reshapes wide data into long form by parsing column name prefixes
import DataFrame

-- Wide table: each measurement variable has one column per time point
let wide = DataFrame.fromRecords
    [ { nr = 1, var1_year1 = 10, var1_year2 = 20, var2_year1 = 5, var2_year2 = 8 }
    , { nr = 2, var1_year1 = 30, var1_year2 = 40, var2_year1 = 7, var2_year2 = 9 }
    ]

-- melt id columns, prefix list, separator, and index name
case (wide |> DataFrame.melt [@nr] ["var1_", "var2_"] "_" "year") of
    Ok df -> DataFrame.columns df
    Err _ -> []
Try it
  • ["nr"] — keep the nr column as an identifier
  • ["var1_", "var2_"] — two prefixes produce two value columns (var1, var2)
  • "_" — separator used to split prefix from suffix
  • "year" — name of the new index column

The suffix after the separator is parsed as Int if all values across that prefix group are numeric (e.g. "year1" → suffix "1"Int 1). Otherwise it stays as String.

Stem Names

The output column name for each prefix is the stem: the prefix with the trailing separator stripped. For "var1_" with separator "_", the stem is "var1". The resulting output column is named "var1".

Index Column Type

The index column type (year in the example) is inferred from the suffix values:

  • If every suffix across all prefixes parses as an integer → the index column is Int
  • Otherwise → the index column is String

This inference is consistent across all prefixes in a single melt call. If var1_ has suffixes "year1", "year2" and var2_ has suffixes "year1", "year2", all four suffixes are "year1" and "year2" — not integers — so the index column is String.

Next Steps