I’ve been searching high and low for a simple HTML parser. I need a simple HTML parser, written in JS, that has a SAX like interface so that I can filter out potentially unsafe content. I’ve looked at existing Perl, Java, C (and more) parsers but they are either too much or rely to much on existing frameworks.
So I created my own. This one is not very strict… the goal is to take a string containing HTML, send it through the parser and build another HTML string. The resulting string would then be passed to Mozilla for rendering. So, if the original HTML is incorrect but works in Mozilla it should be ok that the resulting HTML contains the same error.
The parser is a SAX based parser with a content handler and then I have a specific content handler that filters out things that are dangerous.
These files were created for Sage so the will be available under MPL