More information linked from:
     http://www.openoffice.org/issues/show_bug.cgi?id=11215
   http://www.xpertss.com/cgi-bin/WebX?23@101.nptzachifeG.0@13%40.ee735e2/0
and [ very interesting ]
   http://info.wgbh.org/upf/pdfs/BentoSpec1_0d5.pdf

Thunking also seems to happen with 0x01 & 0x02 characters
There is a ~unambiguous string run length
      => we must have some fixed-length records.


Framing from repeat sections: ?

000019f0  40 10 40 10 61 fb ff 70  01 45 2e 8c 03 04 12 01  |@.@.a..p.E......|
00001a00  40 10 40 10 61 f7 ff 70  08 40 01 50 03 54 82 03  |@.@.a..p.@.P.T..|
00001a10  04 12 01 40 10 40 10 61  f7 ff 70 01 45 2e 8c 03  |...@.@.a..p.E...|
00001a20  04 12 01 40 10 40 10 61  ef ff 70 08 40 03 50 04  |...@.@.a..p.@.P.|
00001a30  54 82 03 04 12 01 40 10  40 10 61 ef ff 70 01 45  |T.....@.@.a..p.E|
00001a40  2e 8c 03 04 12 01 40 10  40 10 61 df ff 70 08 40  |......@.@.a..p.@|
00001a50  05 50 05 54 82 03 04 12  01 40 10 40 10 61 df ff  |.P.T.....@.@.a..|
00001a60  70 01 45 2e 8c 03 04 12  01 40 10 40 10 61 bf ff  |p.E......@.@.a..|
00001a70  70 08 40 01 50 06 54 82  03 04 12 01 40 10 40 10  |p.@.P.T.....@.@.|
00001a80  61 bf ff 70 01 45 29 8c  03 04 12 01 40 10 40 10  |a..p.E).....@.@.|
00001a90  61 7f ff 70 08 40 03 50  07 54 82 03 04 12 01 40  |a..p.@.P.T.....@|
00001aa0  10 40 10 61 7f ff 70 01  45 29 8c 03 04 12 01 40  |.@.a..p.E).....@|
00001ab0  10 40 10 61 ff fe 70 08  40 05 50 08 54 82 03 04  |.@.a..p.@.P.T...|
00001ac0  12 01 40 10 40 10 61 ff  fe 70 01 45 29 82 03 04  |..@.@.a..p.E)...|
00001ad0  12 01 40 10 40 10 61 ff  fd 70 01 45 28 8c 03 04  |..@.@.a..p.E(...|

	  Unless - the tag is unimportant - just seeks the next recurrence (?)


 Odd section:
 02 02 3f bf 62 65 20 6f 6e 20 68 69 73 20 66 69 | ..?.be on his fi
 72 73 74 20 65 6e 74 65 72 69 6e 67 20 61 20 6e | rst entering a n
 65 69 67 68 62 6f 75 72 68 6f 6f 64 2c 20 74 68 | eighbourhood, th
 69 73 20 74 72 75 74 68 20 69 73 20 73 6f 20 77 | is truth is so w
 65 6c 6c 01 c0 02 53 53 45 6e 64 00 00 94 02 2e | ell...SSEnd.....
          ^^^^^^^^
 00 02 6a c1 02 2f 41 02 2d 48 0f 61 01 6d 46 8d | ..j../A.-H.a.mF.
 46 c3 01 f6 28 0e 41 01 96 40 01 42 | F...(.A..@.B

    Very unusual - _looks_ like there may be some overarching stream
encapsulation above this, with an underlying 2byte based framing (?)

    eg. could this be a directory entry & if so, is it possible that
some of the data is just uninitialised / random.

Field 0x01 (last 01) [0] length 152
  01 f6 98 ff bc ff bc ff bc ff bc ff bc ff bc ff | ................
  bc ff bc ff bc ff bc ff bc ff bc ff bc ff bc ff | ................
  bc ff bc ff bc ff bc ff bc ff bc ff bc ff bc ff | ................
  00 00 00 0d 00 4d 69 73 63 65 6c 6c 61 6e 65 6f | .....Miscellaneo
  75 73 00 00 1a 00 4d 00 69 00 73 00 63 00 65 00 | us....M.i.s.c.e.
  6c 00 6c 00 61 00 6e 00 65 00 6f 00 75 00 73 00 | l.l.a.n.e.o.u.s.
  46 69 6c 65 50 72 6f 74 65 63 74 69 6f 6e 00 48 | FileProtection.H
  65 61 64 65 72 00 4c 57 50 53 74 72 65 61 6d 54 | eader.LWPStreamT
  79 70 65 00 50 72 65 76 69 65 77 00 57 6f 72 64 | ype.Preview.Word
  50 72 6f 44 61 74 61 00 | ProData.
Field 0x01 (last 01) [0] length 1

    All (test) .lwp files include 'LWPStreamType' and 'WordProData'
strings.

    Saving a large .jpeg inside a .lwp file, results in a very large
.lwp file - apparently storing the data as a .bmp internally.

00002f60  00 44 c0 01 44 59 c6 a4  31 a2 41 25 1a 49 24 2b  |.D..DY..1.A%.I$+|
00002f70  49 d6 2c 48 0c af 59 c6  a4 31 e1 fa 4f 41 bd 5d  |I.,H..Y..1..OA.]|
00002f80  d9 31 51 76 bc 31 ca 26  2f 32 c9 30 2f 32 aa e0  |.1Qv.1.&/2.0/2..|
00002f90  11 32 7b 18 b8 32 45 19  37 32 ad 65 fe 31 a7 63  |.2{..2E.72.e.1.c|
00002fa0  fe 31 7b 52 30 32 42 4d  da 20 5d 00 00 00 00 00  |.1{R02BM. ].....|
00002fb0  36 04 00 00 28 00 00 00  0c 0d 00 00 23 07 00 00  |6...(.......#...|

    Indeed, chopping the prepended data out, gives a pristine .bmp; 
        => the framing structure can cope with large chunks / lengths
        => there is no escaping of (binary) stream contents
	=> there must be a directory / length description / offset
	   table somewhere.

    OLE2 support - I can insert an OO.o formula editor piece.
	=> there must be some flattening of OLE2 Storage ->
	   a Stream, streams seem to be appended.

.SAM file has [ right at the end ] a name-less directory:
	[Embedded]
	1 .ole 10916 7462 0 0
	2 .ole 18378 11707 0 0
	3 .ole 30085 8382 0 0 
	00038469

	-the .sam file length is 38564

    ole.lwp file has last 8 bytes:
000085a0  a4 43 4d a5 48 64 72 d7  01 01 01 00 02 00 00 00  |.CM.Hdr.........|
000085b0  84 81 00 00 1c 04 00 00                           |........|
	       81 84 + 04 1c = 85a1

    austin.lwp:
000333c0  00 18 ff ff a4 43 4d a5  48 64 72 d7 01 01 01 00  |.....CM.Hdr.....|
000333d0  02 00 00 00 b0 32 03 00  14 01 00 00              |.....2......|
000333dc
    0332b0 +0114 = 333c4

    Looks like the last 4 bytes could be an offset skipping the
Header, followed by a length of data after that.
    Interestingly in both instances, the data is aligned
	=> all .lwp files are a multiple of 4 bytes long;
	    [ seems to hold for our test files ]
	=> there must be some internal pad / alignment adjustment

Lotus hosted junk:

Smartsuite:
	http://www-128.ibm.com/developerworks/lotus/products/smartsuite/
Lotus Word Pro <x.y> docs:
	http://www-10.lotus.com/ldd/notesua.nsf/ddaf2e7f76d2cfbf8525674b00508d2b/8454a86d0399e63c00256c170040be11?OpenDocument
fixing corrupted files (?)
       http://www-1.ibm.com/support/docview.wss?rs=481&context=SSKTZ5&uid=swg27002671&loc=en_US&cs=utf-8&lang=en



From: 	Tim Reid  <timothymreid@hotmail.com>
To: 	Michael Meeks <michael@ximian.com>
Subject: 	Word Pro files

Well, I got somewhere with this, but hardly that you'd notice ...

I enclose some text in ASCII, Word Pro format and Ami Pro format. Also, I'm
enclosing my doodlings on a chunk from the Word Pro file. It seems as if the
text itself is stored in pieces, with some length information. I'm going to
call each of the pieces a scrap (for want of a word). A scrap seems to be:

  o one byte of <length>
  o one byte of <something>
  o some ASCII characters totalling <length> displayable characters,
        but possibly also containing some control characters (e.g. 0xc6,
        0xc2) embedded directly in the text.

The file working.txt shows this. Non-printable characters (and a few
printable ones as well) have been replaced with #nn where nn is the hex
representation of the byte. The only other thing I've added is a double
colon at the beginning of lines which represent a complete scrap, that is
the number of displaying characters is verified correct.

You'll see that most lines are represented by one scrap. However, in a few
cases, a line is split up into several scraps. "Mr. Bennet" seems to suffer
from this particularly badly, but "Mrs. Long" seems fine. Hmmm.

Also there is a 0x0202 (shown as #02#02) before most scraps, and seemingly
always before scraps which start a line.

The long parts like this:

#01#02SSEnd#00#00#94#028#00#02#20#c1#029#41#027#48#19a#01mF#8dF#c3#01#f6#28#
0e#41#01#96#40#01#41

seem to come after each line (possibly each paragraph, as the pasted-in text
comes with embedded newlines, of course - I've only just twigged that) and
two where there is a blank line in the text. Note the one-up counter things
at bytes ten and sixteen, after #02.

Right, better go. I hope that this is of some help, at least. Let me know
what you think.



* Lotus Word Pro - part of SmartSuite, originally named 'Ami Pro'
    + .lwp - lotus word pro
    + .sam - Ami Pro file (from company Samna)

* Interesting (?)
    http://www.abisource.com/mailinglists/abiword-dev/02/Mar/0387.html
    http://webcvs.kde.org/cgi-bin/cvsweb.cgi/koffice/filters/kword/amipro/

* Format dissimilar to MS Word format:
	formatting there seems to be a separate appendage;
	no useful comparison.

* Austen annotations:

It is a truth universally acknowledged, that a single man in

#01#02SSEnd#00#00#94#02*#00#02#5c#c1#02+#41#02#29#48#0ba#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B


#02#02
::#38#b9possession of a good fortune, must be in want of a wife.

#01#02SSEnd#00#00#94#02+#00#02#20#c1#02,#41#02*#48#0ca#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#02,#00#02`#c1#02-#41#02+#48#0da#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#3c#bdHowever little known the feelings or views of such a man may

#01#02SSEnd#00#00#94#02-#00#02d#c1#02.#41#02,#48#0ea#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#3f#bfbe on his first entering a neighbourhood, this truth is so well

#01#c0#02SSEnd#00#00#94#02.#00#02j#c1#02#2f#41#02-#48#0fa#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#45#bffixed in the minds of the surrounding families, that he is consi#c6dered

#01#02SSEnd#00#00#94#02#2f#00#02b#c1#020#41#02.#48#10a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#3e#bfthe rightful property of some one or other of their daughters.

#01#02SSEnd#00#00#94#020#00#02#20#c1#021#41#02#2f#48#11a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#021#00#02v#c1#022#41#020#48#12P#0c#41#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#0d#91"My dear Mr. 

#82#02#04#12#01#40#20#40#20#40#20#0b#c0
::#08#8aBennet,"

#02#02
::#28#a9 said his lady to him one day, "have you

#01#02SSEnd#00#00#94#022#00#02e#c1#023#41#021#48#13P#0c#41#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#0b#8fheard that 

#82#02#04#12#01#40#20#40#20#40#20#0b#c0#0b#8dNetherfield

#02#02
::#16#97 Park is let at last?"
#01#02SSEnd#00#00#94#023#00#02#20#c1#024#41#022#48#14a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#024#00#02#5c#c1#025#41#023#48#15P#0c#41#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#04#88Mr. 

#82#02#04#12#01#40#20#40#20#40#20#0b#c0

::#06#88Bennet

#02#02
::#19#9a replied that he had not.

#01#02SSEnd#00#00#94#025#00#02#20#c1#026#41#024#48#16a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#026#00#02f#c1#027#41#025#48#17a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#41#bf"But it is," returned she; "for Mrs. Long has just been here, an#c2d

#01#02SSEnd#00#00#94#027#00#02>#c1#028#41#026#48#18a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#1a#9bshe told me all about it."

#01#02SSEnd#00#00#94#028#00#02#20#c1#029#41#027#48#19a#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#029#00#02S#c1#02:#41#028#48#1aP#0c#41#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#04#88Mr. 

#82#02#04#12#01#40#20#40#20#40#20#0b#c0

::#06#88Bennet

#02#02
::#10#91 made no answer.

#01#02SSEnd#00#00#94#02:#00#02#20#c1#02;#41#029#48#1ba#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01#41
#01#02SSEnd#00#00#94#02;#00#02^#c1#02<#41#02:#48#1ca#01mF#8dF#c3#01#f6#28#0e#41#01#96#40#01B

#02#02
::#3a#bb"Do you not want to know who has taken it?" cried his wife
