Haskell Gem: Unwrapping Indented Text

While processing some email, I needed a script to unwrap indented lines in email headers. I wrote a Haskell program that turned out to be a short and sweet demonstration of simple but interesting Haskell features (pattern matching, guards, type inference, function composition, and the humble cons) so I thought I’d share it.

If you’re new to Haskell,

  • [] is the empty list
  • [a] is a list with a single element a
  • x : y creates a new list by placing element x at the front of list y
  • ++ is concatenate
  • interact is a function taking a single argument, a function that takes a string and returns a string, and returns a bit of IO that runs the string transformation function on the contents of STDIN and prints the result
  • . (period) is a binary operator that composes functions
  • foldr is right-fold, and behaves like this: foldr f d [a, b, c] == f a (f b (f c d))
  • Lines of the form | a = b are guards, which evaluate to b if a evaluates to True

Example usage:

[/tmp]% cat > header
This is a normal line.
This is a slightly longer line that
  wraps with two spaces.
A short line.
Another example of a long line that
    wraps with tabs (not just once,
    but twice.
Final line.
[/tmp]% cat header | runhaskell Unwrap.hs
This is a normal line.
This is a slightly longer line that wraps with two spaces.
A short line.
Another example of a long line that wraps with tabs (not just once, but twice.
Final line.

Care to share the same program implemented in your language of choice?

16 comments

  1. Rudolf O. wrote,

    The example output is unclear to me. Shouldn’t it wrap around after every newline? I wrote up something as a literate C program, and it’s posted here: http://neverfriday.com/temp/unwrap/unwrap.html

    The outputted source code is http://neverfriday.com/temp/unwrap/unwrap.c

    but I’m not sure if the output is correct.

    The output generated can be viewed here: http://neverfriday.com/temp/unwrap/output.txt

  2. Andre wrote,

    – wrap.lua : wrap space-padded lines
    for line in io.lines(“header”Wink do — remove the “header” to read from stdin
    if not string.match(line, “^%s”Wink then print() end
    io.write((string.gsub(line, “^%s+”, ” “Wink))
    end

  3. David wrote,

    Andre, can you write a pure function like unwrap? Your solution is impure and difficult to test.

  4. David wrote,

    Rudolf,

    The goal is to write a function that preserves the “real” lines while unwrapping the indented lines. New lines that do not begin with whitespace should be preserved.

    The unwrap function is the first step in more email filtering so it should return the unwrapped lines, not print them. Also, writing unwrap as a pure function makes it easier to unit test.

  5. Andre wrote,

    Sorry, but lua doesn’t lend itself well to pure functional programming (no map, etc.) – still I disagree with your assertion that it is difficult to test.

  6. David wrote,

    Andre, then try a pure procedural solution. See my response to Rudolf. Your solution is fine but makes it hard to compare because it prints directly instead of doing a pure transformation.

  7. Andre wrote,

    If you insist that unwrap be a function of a list of strings returning a unwrapped list of strings, an impure Lua solution would be:

    function unwrap(lines)
    local result = {}
    for _, line in ipairs(lines) do
    if string.match(line, “^%s”Wink then
    result[#result] = (result[#result] or “”Wink .. (string.gsub(line, “^%s+”, ” “Wink)
    else
    table.insert(result, line)
    end
    end
    return result
    end

  8. Igor wrote,

    There’s a Perl one-liner for ya:

    perl -e ’sub unwrap {$one = join(“”, @_); $one =~ s/\n\s+//g; split(“\n”,$one)} @in = ; print join(“\n”,unwrap(@in)),”\n”;’ < header

    The basic logic is the same as the Haskell original, but I used a regular expression for the unwrapping part. The unwrap function in there receives a list of strings and returns a list of "unwrapped" strings.

    I admit, this "line" is a bit long, but… Smile

  9. Igor wrote,

    Ooops, looks like the HTML sanitizer ate my characters Smile So, the part that looks like @in = ; should look like @in = <>;

  10. David wrote,

    Andre: awesome, that’s exactly what I was looking for to be able to compare solutions.

    Igor: Very cool!

  11. Rudolf O. wrote,

    Okay, I think I figured it out: http://neverfriday.com/temp/unwrap/unwrap.html

    Also, I didn’t see that it was just one step as part of something more. I’m justifying my choice to make this a small program with the fact that it would be easy to chain things up on a GNU/Linux system Wink

    I mean, you *are* using cat, piping ;p

  12. Paul Battley wrote,

    In Ruby:

    $stdout << $stdin.read.gsub(/\n +/, ""Wink

    It does read the entire file into memory first, though.

  13. David wrote,

    Paul, I didn’t say it explicitly, but the goal is to create a pure function that unwraps indented lines only, preserving “real lines”.

  14. Paul Battley wrote,

    There’s a typo in my solution (“” should be ” “Wink but I’m not sure what you mean by “real lines”. If you mean that you want a pure function that maps lines with indented lines to lines with the indented lines concatenated, then you could do this:

    http://gist.github.com/227510

    But that’s not how I’d implement it in Ruby by choice!

  15. Yitz wrote,

    Nice post. It was linked on the Haskell reddit: http://www.reddit.com/r/haskell/comments/a14mv/haskell_gem_unwrapping_indented_text/

    More comments there.

  16. David wrote,

    Paul, very cool, thank you for both of your interesting solutions.

    Yitz, thanks for the heads up and I like your insane one-liner Smile I tried to write something that non-Haskellers could find interesting, but I do think your approach is totally rad.