Read CSVs in F# / .NET Interactive Notebooks

Date: 2022-11-23 | fsharp | csv | notebook |

CSVs (Comma-separated values) are common file types for exporting and sharing datasets. F# / .NET interactive notebooks were built to make data-analysis and REPL-like development easy.

In this post we'll walk through basic code for reading CSVs in F# / .NET interactive notebooks.

Requirements

In order to successfully follow this tutorial, we're assuming you already have a few things up and running.

  • .NET SDK - This allows you to actually run F# (via .NET). Download .NET.
  • .NET Interactive / Notebook - This allows you to actually run interactively with F# (via .NET). To set this up, read the official docs F# notebooks

Read CSVs

Assuming you've got your F# interactive notebook working, we can move onto reading CSVs.

For this example project, I have a folder structure like this:

* Root/
    * ReadCsvExample.ipynb // The interactive notebook
    * Resources/
        * CsvExampleValues.csv // The csv we'll be reading

We can read the CsvExampleValues.csv from our notebook like this:

ReadCsvExample.ipynb

#r "nuget: FSharp.Data"

open FSharp.Data
open System

let csv = CsvFile.Load(
    Directory.GetCurrentDirectory() + "/Resources/CsvExampleValues.csv")

What this code does:

  • #r "nuget: FSharp.Data" - Installs Fsharp.Data from nuget. If you don't do this, we'll actually end up using the wrong Fsharp.Data.* package (due to namespace collisions) and CsvFile won't exist. #r is how you Reference packages in F# interactive
  • We then import FSharp.Data (the one we installed) and System for use later
  • We utilize the CsvFile.Load function to grab our CSV file and make it available to our code

If you're following along in your own notebook - try this out first and verify your CSV loads before going further.

Iterate over CSV Data

Now that we've got our CSV data available to us, we probably want to do something with it which likely starts with iterating over it. We can't cover every possible way to iterate over this data here but we will cover the most common ones and you can refer to the official F# CSV Parser docs for more.

The contents of CsvExampleValues.csv are simple:

LetterCol,NumberCol
A,1
B,2
C,3

Iterating over CSV rows:

ReadCsvExample.ipynb

printfn "Row Count: %A" (csv.Rows |> Seq.length) 
// Output: 3

csv.Rows 
|> Seq.map (fun r -> printfn "Row: %A" (r.ToDisplayString()))
|> List.ofSeq
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ A, 1 ] }"
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ B, 2 ] }"
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ C, 3 ] }"

Iterating over CSV headers:

ReadCsvExample.ipynb

printfn "Row Headers: %A" (csv.Headers.ToDisplayString())
// Output: Row Headers: "{ Some(System.String[]): Value: [ LetterCol, NumberCol ] }"

match csv.Headers with
| Some(headers) -> 
    headers
    |> Array.map (fun c -> printfn "Header: %A" c)
    |> ignore
| None -> printfn "No headers found"
// Output: Header: "LetterCol"
// Output: Header: "NumberCol"

Next Steps

That should get you started with CSVs in F# / .NET interactive notebooks!

Further reading:

Appendix

Full Source Code

In case you want the full notebook source code:

ReadCsvExample.ipynb

#r "nuget: FSharp.Data"

open FSharp.Data
open System

let csv = CsvFile.Load(
    Directory.GetCurrentDirectory() + "/Resources/CsvExampleValues.csv")

// **Rows**

printfn "Row Count: %A" (csv.Rows |> Seq.length) 
// Output: 3

csv.Rows 
|> Seq.map (fun r -> printfn "Row: %A" (r.ToDisplayString()))
|> List.ofSeq
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ A, 1 ] }"
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ B, 2 ] }"
// Output: Row: "{ FSharp.Data.CsvRow: Columns: [ C, 3 ] }"

// **Headers**

printfn "Row Headers: %A" (csv.Headers.ToDisplayString())
// Output: Row Headers: "{ Some(System.String[]): Value: [ LetterCol, NumberCol ] }"

match csv.Headers with
| Some(headers) -> 
    headers
    |> Array.map (fun c -> printfn "Header: %A" c)
    |> ignore
| None -> printfn "No headers found"
// Output: Header: "LetterCol"
// Output: Header: "NumberCol"

Want more like this?

The best way to support my work is to like / comment / share for the algorithm and subscribe for future updates.