While processing some email, I needed a script to unwrap indented lines in email headers. I wrote a Haskell program that turned out to be a short and sweet demonstration of simple but interesting Haskell features (pattern matching, guards, type inference, function composition, and the humble cons) so I thought I’d share it.
If you’re new to Haskell,
[]is the empty list[a]is a list with a single elementax : ycreates a new list by placing elementxat the front of listy++is concatenateinteractis a function taking a single argument, a function that takes a string and returns a string, and returns a bit of IO that runs the string transformation function on the contents of STDIN and prints the result.(period) is a binary operator that composes functionsfoldris right-fold, and behaves like this:foldr f d [a, b, c] == f a (f b (f c d))- Lines of the form
| a = bare guards, which evaluate tobifaevaluates toTrue
Example usage:
[/tmp]% cat > header
This is a normal line.
This is a slightly longer line that
wraps with two spaces.
A short line.
Another example of a long line that
wraps with tabs (not just once,
but twice.
Final line.
[/tmp]% cat header | runhaskell Unwrap.hs
This is a normal line.
This is a slightly longer line that wraps with two spaces.
A short line.
Another example of a long line that wraps with tabs (not just once, but twice.
Final line.
Care to share the same program implemented in your language of choice?
16 comments
The example output is unclear to me. Shouldn’t it wrap around after every newline? I wrote up something as a literate C program, and it’s posted here: http://neverfriday.com/temp/unwrap/unwrap.html
The outputted source code is http://neverfriday.com/temp/unwrap/unwrap.c
but I’m not sure if the output is correct.
The output generated can be viewed here: http://neverfriday.com/temp/unwrap/output.txt
– wrap.lua : wrap space-padded lines
do — remove the “header” to read from stdin
then print() end
))
for line in io.lines(“header”
if not string.match(line, “^%s”
io.write((string.gsub(line, “^%s+”, ” “
end
Andre, can you write a pure function like unwrap? Your solution is impure and difficult to test.
Rudolf,
The goal is to write a function that preserves the “real” lines while unwrapping the indented lines. New lines that do not begin with whitespace should be preserved.
The unwrap function is the first step in more email filtering so it should return the unwrapped lines, not print them. Also, writing unwrap as a pure function makes it easier to unit test.
Sorry, but lua doesn’t lend itself well to pure functional programming (no map, etc.) – still I disagree with your assertion that it is difficult to test.
Andre, then try a pure procedural solution. See my response to Rudolf. Your solution is fine but makes it hard to compare because it prints directly instead of doing a pure transformation.
If you insist that unwrap be a function of a list of strings returning a unwrapped list of strings, an impure Lua solution would be:
function unwrap(lines)
then
.. (string.gsub(line, “^%s+”, ” “
)
local result = {}
for _, line in ipairs(lines) do
if string.match(line, “^%s”
result[#result] = (result[#result] or “”
else
table.insert(result, line)
end
end
return result
end
There’s a Perl one-liner for ya:
perl -e ’sub unwrap {$one = join(“”, @_); $one =~ s/\n\s+//g; split(“\n”,$one)} @in = ; print join(“\n”,unwrap(@in)),”\n”;’ < header
The basic logic is the same as the Haskell original, but I used a regular expression for the unwrapping part. The unwrap function in there receives a list of strings and returns a list of "unwrapped" strings.
I admit, this "line" is a bit long, but…
Ooops, looks like the HTML sanitizer ate my characters
So, the part that looks like @in = ; should look like @in = <>;
Andre: awesome, that’s exactly what I was looking for to be able to compare solutions.
Igor: Very cool!
Okay, I think I figured it out: http://neverfriday.com/temp/unwrap/unwrap.html
Also, I didn’t see that it was just one step as part of something more. I’m justifying my choice to make this a small program with the fact that it would be easy to chain things up on a GNU/Linux system
I mean, you *are* using cat, piping ;p
In Ruby:
$stdout << $stdin.read.gsub(/\n +/, ""
It does read the entire file into memory first, though.
Paul, I didn’t say it explicitly, but the goal is to create a pure function that unwraps indented lines only, preserving “real lines”.
There’s a typo in my solution (“” should be ” “
but I’m not sure what you mean by “real lines”. If you mean that you want a pure function that maps lines with indented lines to lines with the indented lines concatenated, then you could do this:
http://gist.github.com/227510
But that’s not how I’d implement it in Ruby by choice!
Nice post. It was linked on the Haskell reddit: http://www.reddit.com/r/haskell/comments/a14mv/haskell_gem_unwrapping_indented_text/
More comments there.
Paul, very cool, thank you for both of your interesting solutions.
Yitz, thanks for the heads up and I like your insane one-liner
I tried to write something that non-Haskellers could find interesting, but I do think your approach is totally rad.