Computers and people: which has more accurate formatting?

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

Yes, it's about formatting and HTML.

HTML and XML use matched tags, a <begin> tag, and a </begin> ending tag.

On forums like this, you can type in:

 [pic]picture.jpg[/quote]

Which PH will mis-interpret.
So my puzzle is why it does this?
Why not ignore what is in the closing tag entirely?
The closing tag content is unreliable, irrelevant and unnecessary.

PH is not the only forum doing this, but really it's a bug that should no longer exist, in fact you can write entire websites without closing tag content. It is a hangover from HTML and IMO needs to be dropped thumbup

In addition, a computer can also count much better than a person, and so it can add closing tags as required, preventing incorrect computer programming by a contributor messing up the look of a page or preview.

It's all down to how much technical skill you demand from users, versus how much technical skill you actually need them to have, two very different things biggrin

Edited by Mr Will to stop breaking everyones quotes! wink

Edited by Mr Will on Wednesday 13th February 11:06

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

zaktoo said:

The only way I can see to solve it is not to have the software ignore tags which need to be there, but for PH to provide a WYSIWYG inline editor for composing replies and posts. That can generate correctly formatted content without requiring too much knowledge from the user.

In fact, if anyone is reading this who thinks it a good idea, I'll put my hand up to code such a thing...

Dilberts quote above really proves my point doesn't it biggrin

No closing tag content is needed at all, you just need the opening tag, the closing tag needs to exist but it's content is entirely redundant and sometimes misleading.

Dilbert:
HTML does not always mandate closing tags true, I merely stated that the forum-text formatting only has [opening] and [/closing] tags that hopefully match because of the coders looking at the HTML and not thinking it through.
You only need a simple stack to follow which tag you are on.

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

dilbert said:

The point is that where a compulsory end tag is present or absolutely known to be absent, the execution stack and the managed stack can be the same. No processing has to be done to establish the relationship between the execution stack and the managed stack.

I'm not sure how the execution stack comes into it - sorry my post was unclear.
Using a simple stack to keep track of what is on the page is not only easy, but I've proven it can be done on a huge scale: ALL CONTENT on This Website is generated by a simple forum text system that uses empty braces to terminate tags. Wot I wrote.

For instance, an italic item will be [i]this is italic[].

I agree that you cannot do something like:
bold and italic or just italic

- but no one ever does that. To get the same effect you would just use:
[b]bold [i]and italic[][][i] or just italic[].

One hidden advantage is the huge speed at which you can bash out content with. You never have to remember which tag, use the backslash key or type unnecessary characters, try typing this yourself:

[+1]This is big text [b]in bold[][]

- fast eh?

To maintain backward compatibility you can just treat anything with a / in it (any closing tag) as an empty tag. I think you will struggle to find a single post or article that needs those closing tags to be filled..

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

dilbert said:

What you're suggesting seems like a nice idea, but the question arises about what happens when someone does the thing that you say nobody ever does. I agree, it's sweet all the time they don't, but when they do..... What then?

Well forum text is completely under the control of the parser, so this is not a problem at all (unless the parser code is buggy).

If someone messes up and tries
[b]bold [i]and italic[] or just italic[]
instead of
[b]bold [i]and italic[][][i] or just italic[]

Then he just gets this: bold and italic or just italic

The effect (and whole concept in fact) is that the parser is filling in the closing tag content for the user, based on a simple stack - for this statement the stack goes

[push b]bold [push i]and italic[pop (i)][pop (b)][push i] or just italic[pop (i)]

dilbert said:

Whilst it's easy to see in a short little two liner, where the problem has come from, in a monster page, it's much more difficult. I would agree that all you have to do is look for the first error in the rendered output, but there are lots of circumstances where this is less than ideal.

It works fine for huge pages, but you have a good point about errors. My point is that a) that problem already exists, and is made worse by users using the wrong tag, and b) due to the stack, you ALWAYS know how to closeout so you can always guarantee perfect HTML, even if the formatting is sometimes not what the user wanted. No parser can be psychic though biggrin

dilbert said:

If the source is automatically generated from a number of places, you can't easily figure out if the thing you did is ok, if someone lese made an error in a part that they produced earlier.

Actually it is dead easy to spot - the format errors stand out on the page for you to see.

dilbert said:

SGML does not have this limitation. End tags are named. A compulsory end tag will close any ammount of child tags, and levels of child tags, in order to match it's opening tag.

Once the problem has occured, with a scheme such as you propose, it's very difficult to pick things up again. Once the context is burst, the sense is gone. SGML is resilient.

SGML is only resilient if the input deck is not corrupted - but saying this, PH does not do any error checking of incorrect ending tags anyway. [b]Does it[/i] biggrin

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

Mr Will said:

If I was implementing a system like yours, I would be tempted to scrap closing tags all together and go for something more like this:

 [b=this text should be bold]

the '[' character would denote that everything upto the '=' is a tag, and the ']' marks the end of the formatting.

It would still have the same weaknesses but would be even quicker to type and much easier to read, plus bracket matching would make it easier to debug

Actually I do use that for atomic items, for instance a picture is [Ppicture.jpg] or [Fhttp://somesite/flash/widget.swf]. I think for general items you would get into nesting complications of losing ones place on big pages, for instance an indented paragraph ([I]paragraph[]) or tables etc.

The only changes I'm really suggesting with PH is a) just to ignore the closing tag content, and to allow an empty tag to be treated as a closing tag, and b) to tidy up after if someone misses any closing tags (i.e. while (pop(stack)) close_html(stack))

Mr Will said:

Out of interest, have you ever tried programming in python?

Yes I have, I have to say that I do like brackets, as in PHP and C. Perl of course enforces brackets, whereas C and PHP have the friendly concept of statements smile

I can get used to Python, but in Vi brackets make navigation so much easier smile

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

CuteStudio Ltd is the company, the product itself is called 'Silk', but it is not really a client-server system, I think that title is too grand, it is more a presentation format biggrin

Docs are Here for browsing

I'll just stuck the reference on so you can read it, it doesn't as a rule tend to live there however.

It is really a type of wiki, but expands the idea to encompass the entire site, so you can change layout, style etc on a level by level basis - which you will see as you view the module docs. Currently it's finished enough to present stuff with and so I can whack stuff up onto the net, but does not have any remote editing facility. The main reason I wrote it was in fact for lo-hassle presentation - which is seems to do well for me.

It's the type of thing for intranets and stuff I guess, but the main feature was the ability to hack simple text files to create rich pages and content, you'll notice as you browse I think.

To see the source for any page, at the URL just change /index.php for /page.txt (and /layout.txt for containers - but there is only one at the top level IIRC). Local /style.css files vary the style as one descends into the site.

The Text module is the one that contains the CuteText format that uses no closing tag biggrin

You can also see that text source BTW if you look at the corresponding /page.txt

It also does stuff like showing the size of downloadable files automatically etc, which is verified by the site loading tools offline. In fact to add the reference part then I had to:
1 Copy that directory over
2 Click on 'build site'
3 Click on 'verify site'
4 Click on 'sync upload site'

i.e. this post took considerably longer biggrin

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

dilbert said:

I think thats the thing that makes it interesting. I understand that you're dealing with the presentation stuff. TBH it doesn't interest me too much (to have to code it)because it's just a different way of doing the same thing. I would always choose what I would see as the most generic way (SGML) because it maximises the potential for reuse.

The thing is though that there are a lot of people out there who want to make things simpler, quicker, easier, and possibly just different but these things are important too.

I suppose the app I posted here, is similar to frontpage, but as a client server solution that can be expensive, depending on what you want to do. Also what I'm doing doesnt have the breadth of something like frontpage. I see that app as being a simple console to the server reather than the environment in which you would want to develop.

Frontpage et al, will often allow you to work with images. I can put a browser window on the app, and I can also offer quite a bit of image processing capability. I can deal with gif, jpeg, and bmp, but I don't yet have png or tiff input capability, and I don't really want to write an image editing application.

Unlike frontpage this solution does not lead you into .NET, which I don't like, and I think a lot of others don't like either. Explicity, this solution is designed to work with php.

There are a whole raft of reasons to make such a capability web based, but there are also a whole load of reasons why you might not want to do that. Certainly I don't think I would want to have to use a web interface to update my pages every day.

One of the important capabilities that the offers is the ability to upload a page and it's references to the server automatically. The same is true for downloads.

The server is content managed, in the sense that it supports permalinking. One of the capabilities that the console has is to translate a working page on your disk, into a validated, permalinked page on the server.

The server achieves this through a similar scheme to your own for metadata. The server can produce little CSV files which automatically reflect the site structure, and are easily used to create a page frame in PHP.

On windows the server is ISAPI, so it's quite quick. All that's necassary on the "web server" is a single php file that calls into the "document server". By this means you only need to use FTP once to put the stub on the "webserver".

I'm not sure I 100% understand what your app is, would I be right is guessing it is a client-server document access system, via http, akin to the web but using a different protocol to the usual HTML/CSS?

If it powers a website I'd be interested to have a look-see!

My CuteText renders straight to XHTML/CSS by PHP routines but I may end up using it to power some help pages in apps too. I see you are using C++, I'm a bit hopeless with that so I tend to stick to C.

dilbert said:

I'm critically concious that Linux is a (very) cheap alternative to windows, and a linux version of the server is definitely planned, for use with apache. But such things will depend wholly on any interest (or otherwise) that I get.

The client is obviously windows, and I don't see that as such an important problem, but I'm hoping that Wine is going to be my friend there. Ultimately though I guess I'll always be a 'dozer! (Mac ooer - what's that?)

I think Wine does a good job for Linux, especially if you compile your sources in with it, I write software for Mac/Linux/Windows which is simple with text code using MinGW, but for GUI one has to fork at the graphics level into Win32 or X-windows (or Mac native, but I haven't gotten around tuit yet smile

Funnily enough for graphics Win32 is really very good whereas Xwindows is completely pants (rotated text anyone smile

), but for program development I always prefer the Unix command line and tools. I'm not sure Wine will allow MAC access though, although I'd be surprised if no one had done it. It can't be rocket science smile

One reason I chose W3C HTML/CSS/Javascript was to get browser compatibility, just so I could avoid getting into GUI coding for presentation stuff, as experience showed that all the cross platform porting and testing often took far longer than anything else. Therefore ALL my documentation etc just tends to be straight HTML/CSS - so I can pupblish it on the web or scoop it up into wget or a site reader and stick it onto a CD or zip file. Some things you simply can't do that way however!

dilbert said:

Crazily this was only intended as a capability to support my upcoming website, but as time has progressed it's looked more like it ought to be a product. More oddly some of the other stuff I'm working on is related to signal processing, like your sound studio, but purely IIR/FIR filters, design, and test.

I'm more of an analog electronics person myself, despite dabbling in digital, although I've got to integrate a FIR filter soon into DeClip to allow it to upsample to different bit rates.
And yes, soon I'll be gui-ing for that too, no rest for the wicked biggrin

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

ginettag27 said:

Globulator said:

For instance, an italic item will be [i]this is italic[].

I agree that you cannot do something like:
bold and italic or just italic

- but no one ever does that. To get the same effect you would just use:
[b]bold [i]and italic[][][i] or just italic[].

consider the following :

 this is in bold this is in italic and bold</>this is in?</> this is in italic</>

How do you know what to render "this is in?" in? is it in bold or italic or neither?

The only way around that would be to make compulsory start tags, which would be a pain and a backwards step imo.

Alternatively why have a fullstop at the end of a typed sentence? Why don't you just say that a typed sentence ends when you come across a capital letter?

Apologies if I've got the wrong end of the stick...

It isn't the wrong end of the stick at all - it is in fact the problem!

This thread was opened in website feedback - someone moved it here instead.

The problem I see is that this website (PH) uses end tag names. Except it doesn't check them, and reject the post, it just munges through.

So my observation was:

Why bother assigning any meaning to end tags at all?
Particularly as they are not checked, but lay dormant.
- I realise they are checked to add the effect, their syntax is never checked.

i.e.
Either A) Check the end tags and reject syntax errors
or B) Do not check the end tags and just use the start tags + a stack.

Outside of this thread I'd be intrigued to see any intentional and correct tag overlap in the whole site. So we all dutifully type in our closing tags - but when we are talking with a giant computer, easily capable of doing that for us, why do we have to do it?

 (turn on bold)
this is in bold
 (turn on italic)
this is in italic and bold
</> (turn off the last one (italic))
this is in?
</> (turn off the last one (bold))
 (turn on italic)
this is in italic
</> (turn off the last one (italic))

I was going to accent the above with formatting, but it seems the formatting does not go multi-line for the [i] and [b] codes.

[b]bold
fails[/b]

Sheesh!!

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

cottonfoo said:

Globulator said:

Perl of course enforces brackets

How and when?

Try:

if ($a == 0) $b = "hello";

in Perl, and you'll see smile

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

cottonfoo said:

Globulator said:

Try:

if ($a == 0) $b = "hello";

in Perl, and you'll see smile

You mean braces rather than brackets? For simple statements:

$b = "hello" if $a == 0;

No brackets or braces smile

Braces, yes.
I did forget the terminating if, thank-you for reminding me of another reason why I don't go back to Perl biggrin

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

cottonfoo said:

But anyway, back on topic, I like closing tags because debugging would be a nightmare otherwise. Also possible to overflow the stack.

As someone who has used empty end tags format content exclusively on two websites (and several more sub-sites) I'd have to disagree;- from experience debugging it has been stunningly simple.

If you used an array in PHP (which is what I do) or a linked list in C/C++ then you'd run out of memory for other reasons way before that time.

How many format levels do you use? 100, 200. This post used only 1 smile

I'll admit the concept is a little 'out of the box', but I've tried it now and it works a treat, and is an absolute joy to enter.

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

cottonfoo said:

What do you do about self-closing tags like img, input and br? You'd still need a DTD to successfully parse that markup (but perhaps you have one).

I merely convert from my forum-text standard (cutetext - albeit a little ad-hoc) into XHTML, I don't go the other way very often (except in JS wysiwig editor, when I go from design mode to cutetext - but that's another thing altogether + it's client side smile

)

Original Poster

13,841 posts

233 months

Wednesday 13th February 2008

dilbert said:

I said I'd post....
It's working now!!!

Now THAT is interesting.

Mmmmm. We have product alignment.
Still about 3-4 weeks away, but yes, very close to what I'm doing, but complimentary, not really overlapping. Which is nice.

PM me, we should chat thumbup

Original Poster

13,841 posts

233 months