Hi Tony,
Your Ideas are all right and sensible. ...
And we do want to have "roundtrips" to and from every possible content
format. ... The pandoc project is a very good example / implementaiton for
this workflow. .. With the exact same problems. (That's why they did their
own MD extensions..)
But
If we really want something reliable, *we can't allow data loss in the used
"storage format"!* ...
---------------- Some more internals and background
If you dig deeper into the topic you will inevitably end up at SGML
<https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language>
Standard Generalized Markup Language. SGML roots goes back to the 1960s ...
Simplified: it is a language / standard, to define other languages. To
achieve this, you need to increase complexity quite a bit. ..
SGML was originally designed to enable the sharing of machine-readable
large-project documents in government, law, and industry. Many such
documents must remain readable for several decadesâa long time in the
information technology field
...source WikiPedia
You may ask: Why, not just implement SGML as our storage format and all
problems solved. .... IMO becasue our heads would explode. At least mine ;)
SGML is really hard and absolutely not human friendly by design. ... see
the quote from above ;)
That's why several "easier to handle" dialects have evolved: One of them is
XML... (still clunky and human unfriendly)
Once there was an attempt to force XML on the web with XHTML. ..But the
rules where so strict, that the browsers that implemented it broke the
existing web. .. The project blew up in front of their face ...
So we ended up with HTML5 .... Which is a standard that is "just good
enough", to be easy to use by machines and OK by humans.
----------------
As you found out HTML is a relatively good format to store "structured"
text. see: -> relatively. ... Because it was defined for a very specific
use-case. ... "The Web"
TiddlyWiki actually mis-uses it, to store tiddlers in HTML format. ... If
you open a tiddlywiki.hmtl file you can find elements like this. (scroll
down to line ~10.000++) <div author="JeremyRuston" core-version=">=5.0.0" dependents=
"$:/themes/tiddlywiki/snowwhite" description="Centralises the story river"
name="Centralised" plugin-type="theme" title=
"$:/themes/tiddlywiki/centralised" type="application/json" version=
"5.1.16-prerelease">
<pre>{
html-escaped content there < is encoded as < and > is > ....
</pre>
</div>
This is the system tiddler: $:/themes/tiddlywiki/centralised were I
replaced the content with a 1 liner.
At TW startup this conent is transfered into something much more useful for
for javascript developers. -> JSON <https://en.wikipedia.org/wiki/JSON>(JavaScript
Object Notation)... It's easier to handle, because it is already part of
the js-language itself. No bulky 3rd party libraries are needed. JS itselfe
is just fine.
The cool thing with JSON is, it can also describe structues very well, in a
programmer friendly / readable format.
http://www.jsonml.org/ has some examples, where an html table is described
as JSON. ... Web programmers love JSON, because it is "just good enough" to
be extremely useful. Programs can handle json descriptions very well.
The purpose of JsonML is to provide a compact format for transporting XML-based
markup as JSON which allows it to be losslessly converted back to its
original form.
source jsonml.org
IMO the page is outdated, and there are probably newer projects, but the
idea behind it, is the right one.
TiddlyWiki uses JSON to convert
- tw-syntax -> into the parse-tree, the
- parse-tree is translated into -> the widget-tree
- widget-tree is translated into -> HTML output
If you open: https://tiddlywiki.com/prerelease/ and copy the following
text into the tiddler.
* line 1
* element 2
Open the preview panel
and select parse-tree;
Mode:block
[
{
"type": "element",
"tag": "ul",
"children": [
{
"type": "element",
"tag": "li",
"children": [
{
"type": "text",
"text": "line 1"
}
]
},
{
"type": "element",
"tag": "li",
"children": [
{
"type": "text",
"text": "element 2"
}
]
}
]
}
]
For this example the widget-tree looks similar, but if you enter
{{HelloThere}} you'll get a huge difference between parse and widget-tree.
Those *-tree like formats are relatively easy to handle by algorithms. ...
The problem with the TW parse-tree at the moment is, that it "looses
information". May be, because of performance reasons. Also implementation
speed is faster, and complexity is much less, if you skip functions.
So we actually can't use it, as a storage format. ...
BUT
If we would solve that problem, we could transform TW-syntax <-> Markdown
<-> HTML <-> TW-syntax <-> [you name it]
Such an itermediate storage fromat would be called an AST
<https://en.wikipedia.org/wiki/Abstract_syntax_tree> Abstract Syntax Tree.
While ASTs are known for programming languages, the mechanism we need is
the same. Our parse-tree and widget-tree basically are ASTs...
Some transformation paths exist and some _don't_. eg:
- tw-syntax -> AST -> HTML ... exists
- HTML -> AST -> tw-syntax ... doesn't exist
hope that helps
have fun!
mario
--
You received this message because you are subscribed to the Google Groups "TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tiddlywiki+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywiki.
To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/02a97a9d-90ee-4a97-8d0f-b1830aaea41f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.