May 7, 2008

Search and Replace Wildcard Characters in Dreamweaver

Search and Replace Wildcard Characters in Dreamweaver

Dreamweaver has a powerful search/replace ability which includes ‘regular expressions’. This allows you to scan and replace particular html without affecting data.

Here’s an example of how useful this is:

Let’s say you are harvesting content from Wikipedia and putting it on your own website. Though I don’t endorse that activity, it is allowed according to the GDFL and a lot of people do it. So, here’s a way to make it easy.

Let’s say you find a huge table full of data that you wish to use on your own website. Since Wikipedia has tons of links and junk in the code, you want to strip it all out. One way is to copy and paste into Excel, then save as a .csv or .txt file, thus stripping out the extra invisible formatting html code that was previously a part of it. Then, close and re-open the .csv or .txt into Excel, then copy and paste into Dreamweaver (or your HTML editor). While this works, Excel still seems to copy over some html formatting, such as <td height="17" … on every field of the table, which is annoying.

You can do a find/replace in Dreamweaver for height=”17” and leave the replace field blank. That would solve that problem, however, Wikipedia often uses footnotes which add [1] [2] [3] etc in superscript, such as the example below. The question is how do we delete these brackets from our copied table, but preserve the data. Good news- we can do that with Dreamweaver, using regular expressions.

Change from this:
<td height="17">Batman Begins[1][2]</td>
<td height="17">Superman</td>
<td height="17">Army of Darkness[3]</td>

Into this:
<td>Batman Begins</td>
<td>Superman</td>
<td>Army of Darkness</td>

If you had a huge table full of this with 100+ rows with 50 or more footnotes, it would take a long time to manually remove all of the brackets by hand. Here’s a way to automate it in Dreamweaver:

(Make sure you are searching the ‘source code‘ and that the ‘Use regular expression’ box is checked)

Find:
<td height="17">([^<]*)\[[^"]*</td>

Replace:
<td>$1</td>

Result:
Dreamweaver will instantly strip out all the junk from your code and replace it with the core code while preserving your data. In this case, the wildcard variable will preserve anything between <td height="17"> and </td>.

Explanation:
The find is the prefix of the tags, then the wildcard variable that’s stored: ([^<]*) then, I wanted to remove the brackets, so I put one in, but since we’re using expressions, it has to be ‘escaped’ to tell it viagra next day we literally mean the bracket, so I put this \ before the [ then I added a non-stored wildcard variable (the other junk I want removed), so I added: [^"]* then the close tag </td>. Then the replace is the simple $1 variable between the tags which recalls the stored variable. Very cool!

Another challenge:
Let’s say you want to copy a huge list of links from Wikipedia and change them to our own links on our own website. Here’s an example:

Change from this:
href="/wiki/Army-of-Darkness">
href="/wiki/Raiders-of-the-Lost-Ark">
href="/wiki/Pulp-Fiction">

Into this:
href="http://www.domain.com/Army-of-Darkness.php">
href="http://www.domain.com//Raiders-of-the-Lost-Ark.php">
href="http://www.domain.com/Pulp-Fiction.php">

In Dreamweaver, select Find/Replace…

1. Check ‘use regular expression’
2. Do Find for:
href="/wiki/([^<]*)">
4. Replace:
href="http://www.domain.com/$1.php">
5. It preserves the variable inside

Without the regular expression, you could have done Find/Replace for the first part, but when you wanted to add the .php to the end, you’d be stuck. How else would you do it?

Pretty incredible, huh? You can automate the changing of links or anything on an entire website with thousands of links and pages in just seconds. All using the stored wildcard variable.

([^<]*) is stored (use $1 to retreive in replace)
[^"]* is unstored

You can also do Find/Replace to recall multiple variables at once, like this:

If multiple wildcards:
([^<]*) ([^<]*) ([^<]*)
Use:
$1 $2 $3

A tool like this can give you the power to harvest public domain or free content, manipulate data and repurpose it for your own site.


The regular expression you seek is: [^"]*

so if you want to change all these:

<a class="wildcat" href="lion.html">
<a class="wildcat" href="tiger.html">
<a class="wildcat" href="leopard.html">

to:

<a class="wildcat" href="bigcat.html">

you search for:

<a class="wildcat" href="[^"]*">

and replace with:

<a class="wildcat" href="bigcat.html">

(make sure you tick the match case & regular expressions + seclect 'source code')

Permalink • Print • Comment

Leave a comment

You must be logged in to post a comment.

Made with WordPress and Semiologic • Sky Gold skin by Denis de Bernardy