PHP function to validate HTML tags for user input fields.

I have some forms with text areas that take user input. My forms aren’t as fancy as the editor that wordpress provides. I want to allow users to add html tags, but need to make sure they close their div’s and other tags which might screw up the whole page if they forget to close them. And I want to control which tags I allow. So I wrote a function that uses the htmlentities function to disable all markup and then, for those tags that I want to allow, does some checking and converts them back into enabled markup.

The function won’t fix broken markup but it will isolate broken markup so it doesn’t affect the rest of the page.

I’ve only had this function running for a day and I can’t promise I have the bugs all worked out. But I think it’s good. Please let me know if you see or find any problems with it.

I’ll copy the function below so you can get a quick view of what it does, but I’m having trouble getting wordpress to display it correctly so I put the source code in a plain text file http://www.suffolkian.com/bnmng_old/blog_files/bnmngcomputing/function_safetags.txt. Of course, you’ll have to put it in php file to make it work.

function safetags ($text) {

#This function uses htmlentities to disble markup, then converts those tags that are allowed and properly formatted back into enabled markup.

$loner_tags=array(‘br’, ‘hl’, ‘img’, ‘p’ ); #These are tags that can left unclosed. But it’s OK if they are closed.
$closed_tags=array(‘a’, ‘b’, ‘center’, ‘div’, ‘left’, ‘li’, ‘right’, ‘span’, ‘ul’ ); #These are the ones that have to be closed.

$i=0;
$pattern=array();
$replace=array();

#convert the entire text to html entities. This takes all tags and quotes and converts them to harmless strings.
$text=htmlentities($text);

#for all allowed tags, if they have closing tags then enable them.
$all_tags=array_merge($loner_tags, $closed_tags);

foreach($all_tags as $tag) {
$pattern[]=’/(?<![])<' . $tag . '((?<=' . $tag . ')s+.*?)*(?<![])>(.*?)(?<![])</' . $tag . '(?<![])>/si';
$replace[]='2′;
}

#Enable those tags that don’t have closing tags and are allowed without them.
foreach($loner_tags as $tag) {
$pattern[]=’/(?<![])<' . $tag . '((?<=' . $tag . ')s+.*?)*/?(?<![])>/si';
$replace[]='’;
}

#Within each properly converted tag, convert the properly quoted attributes.
$pattern[]=’/<([^]*?)"(.*?)"([^]*?)>/’;
$replace[]=”;

$pattern[]=’/<([^]*?)'(.*?)'([^]*?)>/’;
$replace[]=”;

#Here’s where the work gets done

for($i=0;$i<count($pattern);$i++) {
while(preg_match($pattern[$i], $text, $matches) ) {
$text=preg_replace($pattern[$i], $replace[$i], $text);
}
}

return $text;
}

Update (2010 01/03): I’m amazed whenever I get hit. But the one person who happened to stumble upon my blog this month also managed to hit a broken link. Oh well. The link’s fixed now.

Advertisements
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s