Tuesday, May 4, 2010

Hubris and nested markup for text in the humanities

My attention is usually directed to alternative markup for the humanities, specifically poems and their alternative translations as presented on the internet in a web browser.

Consider a simple case: a poem of two stanzas or verses of 6 lines each.  The first four lines of verse ONE satisfy some criterion A and the last 4 lines of verse TWO satisfy some criterion C. Lines 5 through 8 satisfy some criterion B.  Proceed to markup using HTML, LML, XML or my preferred Curl.

One solution in markup such as Curl which has no "closing tag", e.g.,
{tag
  content
}
rather than
<tag>
  content
</tag>
might be this:
{A
line 1
line 2
line 3
line 4
|{B line 5
line 6
} || end of A (a comment starts with dbl-bars )

{C
line 7
line 8 |} || end of DOUBLE_BRACED  B  (a comment only)
line 9
line 10
line 11
line 12
} || end of C comment
Note that such a double-brace solution as |{  and |} is not available in Curl.  In mathematics, such bracketing would be nonsense.  But two stanzas are not two equations.

In zoology, the existence of the species Lupus canis (wolf) and Lupus latrans (coyote) yields to the fact that wolves and coyotes hybridize in nature. In the northeastern United States the question is what should journalists call these hybrid animals.  In the case of common tree species which readily hybridize, the challenge for botany is far greater.  Classification always cedes ground to the facts of genetics.

But in the simple case of a mere translation of a difficult two stanza poem from German to English, markup falters.  This would not be the case with simple paint programs, where one "markup range" or "marginal note" translates readily into over-lapping graphical figures.  Some are less effective than others: think of the reach of A above as red, C as blue and B as some shade of violet.  Over-lapping transparent coloured rectangles, if you like.  But the "markup" on the page risks being unreadable.  Text has been abandoned to graphics.  Font triumphs over content, if you will.

The awkward XML solution is
<A>
<B>
</A>
<C>
</B>
</C>
which you may detect above as the overlapping Bold-Italic-Bold of a sentence above.  It is awkward because you must now parse with specialized software to extract that of A which satisfied B.

Consider this alternative:
<A>
<B>
</B>
</A>
<C>
<B>
</B>
</C>
What is now lost is the continuity of B.

Visiting any major poetry or philosophy web site will reveal that very little is available to compare translations - because the problems of markup are daunting.  Arguably philosophy students are more penalized by attempts to markup premises and suppressed premises relevant to an argument when the text is itself a challenge.  One example would be the limited efforts of philosopher Jonathon Bennett to make Early Modern philosophy texts more accessible to students.

For an example of from poetry, take the poem by Werfel and the "translation" by Robert Lowell.
The solution which I find frightful is that of "lines" in poetry and "sentences" in philosophy.  Here is the two stanza poem as lines:
{line num=1,stnza=1,feature="A", line 1 here} 
{line num=2,stnza=1,feature="A", line 2 here}
{line num=3,stnza=1,feature="A", line 3 here}
{line num=4,stnza=1,feature="A", line 4 here}
{line num=5,stnza=1,feature="B", line 5 here}
{line num=6,stnza=1,feature="B", line 6 here}
{br}{br}
{line num=7,stnza=2,feature="B", line 7 here}

{line num=8,stnza=2,feature="B", line 8 here}
{line num=9,stnza=2,feature="A", line 9 here}
{line num=10,stnza=2,feature="A", line 10 here}
{line num=11,stnza=2,feature="A", line 11 here}
{line num=12,stnza=2,feature="A", line 12 here}
for now I have lost the elegant markup
{stanza
}

{stanza
}
in which each stanza had its lines - with or without line markup.

No comments: