sofetch: A low-boilerplate Haxl-like data fetching library

[ bsd3, data, library, unclassified ] [ Propose Tags ] [ Report a vulnerability ]

Please see the README on GitHub at https://github.com/githubuser/sofetch#readme


[Skip to Readme]

Flags

Manual Flags

NameDescriptionDefault
examples

Build example executables

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0, 0.1.0.1, 0.1.0.2
Change log CHANGELOG.md
Dependencies aeson, async, base (>=4.7 && <5), bytestring, containers, exceptions, hashable, http-client, http-client-tls, http-types, semigroupoids, sofetch, sqlite-simple, text, time, transformers, unliftio-core, unordered-containers [details]
License BSD-3-Clause
Copyright 2026 Ian Duncan
Author Ian Duncan
Maintainer ian@iankduncan.com
Uploaded by IanDuncan at 2026-02-14T21:01:19Z
Category Data
Home page https://github.com/iand675/sofetch#readme
Bug tracker https://github.com/iand675/sofetch/issues
Source repo head: git clone https://github.com/iand675/sofetch
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Executables sqlite-blog, github-explorer
Downloads 5 total (5 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2026-02-14 [all 1 reports]

Readme for sofetch-0.1.0.2

[back to package description]

sofetch

That's so fetch

License: BSD-3-Clause Haskell GHC versions


The problem

Suppose you have a web page that shows a list of blog posts, each with its author's name. A naive implementation fetches each author one at a time:

-- Fetch each author individually, one query per post!
renderPosts :: [Post] -> AppM [Html]
renderPosts posts = forM posts $ \post -> do
  author <- getUser (postAuthorId post)    -- DB round-trip
  pure (renderPostCard post author)

Ten posts means ten separate database queries. A hundred posts means a hundred queries. This is the N+1 problem: you run 1 query to get the list, then N more queries to get each related item. It's one of the most common performance pitfalls in data-access code, and it's easy to introduce without noticing because each function in isolation looks perfectly reasonable.

The typical fix is to restructure your code: collect all the IDs up front, run a single batched query, then stitch the results back together. That works, but it forces your code shape to match your optimisation strategy. Composition suffers: you can't freely combine small functions without worrying about the data-access pattern they produce.

The solution

sofetch fixes this automatically. Write simple, sequential-looking code, and sofetch batches and deduplicates your data access behind the scenes:

renderPosts :: (MonadFetch m n, DataSource m UserById) => [Post] -> n [Html]
renderPosts posts =
  -- All author fetches are batched into ONE query, automatically.
  fetchThrough (UserById . postAuthorId) posts
    <&> map (\(post, author) -> renderPostCard post author)

No matter how many posts you have, this issues a single WHERE id IN (...) query for all the authors. You didn't have to restructure anything. You wrote the obvious code and sofetch made it fast.

N+1 queries vs 1 batched query with sofetch

This works across function boundaries too. If renderPostCard internally fetches comment counts, and renderSidebar fetches the same authors for a "top contributors" widget, sofetch merges all of those fetches together. Functions that were written independently, without any knowledge of each other, still get optimal batching when composed.

How it works (in brief)

sofetch gives you a special Fetch monad. When you write:

(,) <$> fetch (UserById 1) <*> fetch (UserById 2)

...the two fetches don't happen immediately. Instead, sofetch collects them into a round, groups them by data source, and dispatches one batched call per source. The <*> operator (or ApplicativeDo if you prefer do-notation) is the signal that two fetches are independent and can be batched together. The >>= operator (monadic bind) introduces a round boundary: the right side depends on the left side's result, so it has to wait.

flowchart LR
  f1["fetch (UserById 1)"] --> b1["batchFetch<br/>[UserById 1, 2]"]
  f2["fetch (UserById 2)"] --> b1
  f3["fetch (PostsByAuthor 1)"] --> b2["batchFetch<br/>[PostsByAuthor 1]"]
  b1 -. "concurrent" .- b2

Within each round:

  • Keys for the same data source are grouped into one batchFetch call.
  • Keys for different data sources run concurrently.
  • Duplicate keys are deduplicated. The same key appearing in multiple places produces only one fetch, and all callers share the result.
  • Results are cached so the same key never hits the database twice (unless you opt out).

Quick start

1. Define key types

Each kind of data you want to fetch gets a key type, a small type that says "I want to look up this thing" and declares what the result will be. This is the core modelling step: one key type per query shape.

{-# LANGUAGE DeriveGeneric, DeriveAnyClass, DerivingStrategies, TypeFamilies #-}

data User = User { userId :: Int, userName :: Text }
data Post = Post { postId :: Int, postAuthorId :: Int, postTitle :: Text }

-- "Give me a user by their ID"
newtype UserById = UserById Int
  deriving stock (Eq, Ord, Show, Generic)
  deriving anyclass (Hashable)

instance FetchKey UserById where
  type Result UserById = User

-- "Give me all posts by this author"
newtype PostsByAuthor = PostsByAuthor Int
  deriving stock (Eq, Ord, Show, Generic)
  deriving anyclass (Hashable)

instance FetchKey PostsByAuthor where
  type Result PostsByAuthor = [Post]

The key type carries the query parameter (the user ID, the author ID) and the FetchKey instance tells sofetch what type the answer will be. All the required instances (Eq, Hashable, Show, etc.) are stock-derivable, no boilerplate.

2. Teach sofetch how to fetch them

A DataSource instance tells sofetch how to batch-fetch a group of keys. You receive a NonEmpty list of keys and return a HashMap of results, one entry per key:

instance DataSource AppM UserById where
  batchFetch keys = do
    pool <- asks appPool
    let ids = [uid | UserById uid <- toList keys]
    rows <- liftIO $ withResource pool $ \conn ->
      query conn "SELECT id, name FROM users WHERE id = ANY(?)" (Only ids)
    pure $ HM.fromList [(UserById (userId u), u) | u <- rows]

The AppM parameter is your monad. If it has access to a connection pool, config, or anything else, your data source has access to it too. No special environment setup is needed.

If your backend doesn't support batch lookups (e.g. a REST API that only fetches one item at a time), implement fetchOne instead. sofetch will call it for each key:

instance DataSource AppM UserById where
  fetchOne (UserById uid) = lookupUserById uid

You still get deduplication and caching; you just don't get the batched SQL.

3. Write data-access code

Now use fetch in your application code. Program against the MonadFetch typeclass so your functions work with any implementation (production, tests, tracing):

getUserFeed :: (MonadFetch m n, DataSource m UserById, DataSource m PostsByAuthor)
            => Int -> n (User, [Post])
getUserFeed uid =
  (,) <$> fetch (UserById uid) <*> fetch (PostsByAuthor uid)

These two fetches are independent (<*>), so sofetch batches them into a single round. If you prefer do-notation, enable ApplicativeDo and write the equivalent:

{-# LANGUAGE ApplicativeDo #-}

getUserFeed uid = do
  user  <- fetch (UserById uid)        -- batched together
  posts <- fetch (PostsByAuthor uid)   -- in one round
  pure (user, posts)

Both forms produce identical batching behaviour.

4. Run it

handleRequest :: AppEnv -> Int -> IO (User, [Post])
handleRequest env uid = runAppM env $ do
  cfg <- fetchConfigIO
  runFetch cfg (getUserFeed uid)

fetchConfigIO works for any MonadUnliftIO monad (which includes any ReaderT env IO stack, the most common pattern). It wires everything up automatically.

5. Test it

Swap the real data sources for canned data. No IO, no database:

testGetUserFeed :: IO ()
testGetUserFeed = do
  let mocks = mockData @UserById       [(UserById 1, testUser)]
           <> mockData @PostsByAuthor   [(PostsByAuthor 1, [testPost])]
  (user, posts) <- runMockFetch @AppM mocks (getUserFeed 1)
  assertEqual user testUser
  assertEqual posts [testPost]

Because getUserFeed is polymorphic over MonadFetch, it runs unchanged against MockFetch. No special test wiring needed.

A real example: collapsing N+1 cascades

Here's a scenario from the included SQLite example. A blog page needs to render three authors, each with their posts, each post with its comments, each comment with its author name. The functions are written independently at four different levels:

renderBlogPage                    fetches 3 authors
  └─ renderAuthorProfile          fetches posts for an author
       └─ renderPostWithComments  fetches comments for a post
            └─ renderComment      fetches the comment's author

Without sofetch, this is 25+ database queries. With sofetch, traverse automatically merges fetches at the same depth:

flowchart LR
  subgraph R1 ["Round 1"]
    A1["UserById 1, 2, 3"]
  end
  subgraph R2 ["Round 2"]
    A2["PostsByAuthor 1, 2, 3"]
  end
  subgraph R3 ["Round 3"]
    A3["CommentsByPost 1 … 7"]
  end
  subgraph R4 ["Round 4"]
    A4["UserById 4, 5 (deduped)"]
  end
  R1 --> R2 --> R3 --> R4

4 rounds, 4 SQL queries, regardless of the data size. The functions never coordinate with each other. They don't know they're being composed. sofetch handles it.

Key features

  • No GADTs. Data sources are ordinary typeclasses. Key types use stock deriving. If you've defined a newtype, you're 90% of the way to a data source.
  • Your monad, your resources. DataSource is parameterised by your monad, not some framework environment. Connection pools, config, whatever your monad carries, your data sources have access to it. Missing instances are compile-time errors, not runtime crashes.
  • Monad transformer. Fetch m a layers over your existing monad stack. Drop it in without restructuring your application.
  • Swappable implementations. MonadFetch is the interface your application code uses. Production, test, and traced implementations all satisfy it. Swap without code changes.
  • Extensible instrumentation. runLoopWith lets you wrap each batch round (e.g. with tracing spans). OpenTelemetry support lives in the separate sofetch-otel package.
flowchart TD
  A["Application code"] -->|"programs against"| B["MonadFetch (typeclass)"]
  B --> C["Fetch m<br/>production"]
  B --> D["MockFetch<br/>testing"]
  B --> E["TracedFetch<br/>instrumentation"]
  C --> F["DataSource instances<br/>UserById · PostsByAuthor · …"]

Combinators

sofetch includes a toolkit for common patterns:

Combinator What it does
fetchAll keys Fetch a list of keys in one round
fetchThrough toKey items Extract a key from each item, fetch, pair back
fetchMap toKey combine items Like fetchThrough but transform the pair
fetchMaybe maybeKey Fetch if the key is present
fetchMapWith keys Fetch a collection, return a HashMap of results
filterA predicate items Applicative filter; all predicates batched
withDefault val action Return a default on any exception
pAnd / pOr Parallel short-circuiting boolean combinators

Advanced usage

Shared cache across phases

To preserve the cache across sequential computations, use runFetch' which returns the cache alongside the result:

handleTwoPhases :: AppEnv -> IO [Post]
handleTwoPhases env = runAppM env $ do
  cfg <- fetchConfigIO

  -- Phase 1: populate cache
  (_users, cache) <- runFetch' cfg $
    fetchAll [UserById 1, UserById 2, UserById 3]

  -- Phase 2: cached keys resolve without hitting the DB
  runFetch cfg { configCache = Just cache } $
    fetchAll [PostsByAuthor 1, PostsByAuthor 2]

Restricted monads (no MonadIO)

For monads that deliberately hide IO (e.g. a Transaction type that prevents arbitrary IO inside database transactions), use fetchConfig with explicit natural transformations and export a safe runner:

fetchInTransaction :: Fetch Transaction a -> Transaction a
fetchInTransaction = runFetch (fetchConfig unsafeRunTransaction unsafeLiftIO)

The unsafe escape hatches stay private to your DB module. Application code calls fetchInTransaction and never touches IO.

See examples/SqliteBlog.hs (scenario 12) for a worked proof-of-concept.

Examples

The examples/ directory contains two runnable programs:

stack build --flag sofetch:examples
stack exec sqlite-blog
stack exec github-explorer

SQLite blog (examples/SqliteBlog.hs): A blog platform backed by in-memory SQLite. Every batchFetch prints its SQL so you can see exactly how fetches are batched. Covers applicative batching, N+1 avoidance, deduplication, deep N+1 across function boundaries, faceted queries, chunked batching, shared caches, mocks, and restricted monads.

GitHub explorer (examples/GitHubExplorer.hs): Concurrent exploration of the GitHub REST API. Demonstrates sofetch with HTTP backends where the value is concurrency, deduplication, and caching rather than SQL batching.

Packages

Package Description
sofetch Core library: Fetch, DataSource, MonadFetch, cache, engine, mocks, tracing hooks
sofetch-otel OpenTelemetry instrumentation via runFetchWithOTel

Modules

Module Contents
Fetch Top-level re-exports
Fetch.Class FetchKey, DataSource, MonadFetch, MonadFetchBatch, Status, Batches
Fetch.Batched Fetch monad transformer, runners, runLoopWith
Fetch.Engine Batch dispatch with strategy-based scheduling
Fetch.Cache IVar-based cache with dedup, eviction, warming
Fetch.IVar Write-once variable with error support
Fetch.Combinators fetchAll, fetchThrough, fetchMap, etc.
Fetch.Mock MockFetch for testing
Fetch.Traced TracedFetch with per-round callbacks
Fetch.Mutate Mutate for interleaved read-write computations
Fetch.Memo MemoStore, memo, memoOn
Fetch.Deriving Helpers for writing instances (optionalBatchFetch, DerivingVia docs)

Design

See docs/DESIGN.md for the full set of design decisions and tradeoffs.


sofetch is inspired by Facebook's Haxl (Marlow et al., There is no fork: an abstraction for efficient, concurrent, and concise data access, ICFP 2014). It keeps the core idea (write sequential-looking code, get batched data access) while replacing the GADT-based data source API with type families and ordinary typeclasses, and using a monad-transformer design instead of a bespoke environment. See DESIGN.md for a detailed comparison.