I’ve been using and enjoying MadCap Flare for a number of webhelp projects lately. I find that it’s superior to RoboHelp in pretty much every way (except for stability and bug count.)
One particularly annoying bug that I’ve found is that Flare, on occasion, decides to insert non-breaking space characters instead of regular space characters when you hit the space bar. The result is that you end up with line breaks in weird and unsightly places when you generate webhelp or printed documentation.
In most cases, the easiest way to fix the problem is to replace all instances of the entity with a regular space character using your favorite search and destroy program. But what happens if you used some of those characters on purpose? I had this problem recently when documenting some XML formats, using non-breaking spaces to indent the lines of sample code.
Since I couldn’t just search and destroy on without losing all of my hard Ctrl-space work, I wrote a script in Ruby to eliminate most non-breaking spaces while preserving the ones I wanted to keep. Read on for more details.
In my case, I knew that all of the non-breaking spaces that I wanted to keep were contained within <p> blocks with a class of “codesample.” All others could be eliminated. Here’s how the script works, with the actual code below:
- We use find to traverse the directory and all sub-directories of the startDir. We process any HTML files.
- When we find an HTML file, we open the file and save the contents to a string called (cleverly) thestring.
- Now things get fun. We create a regular expression to identify all <p> blocks with a class of “codesample.” We use the split method to search thestring for code samples. split returns an array of strings containing the parts of the file surrounding any code sample blocks. We’ll call this textarray.
- Next, we use the scan method on thestring. scan returns an array containing all the code sample blocks. We’ll call this tossedarray. Since scan and split perform the opposite tasks using the same regular expression, there should always be exactly one fewer items in the array returned by scan than by split.
- Finally, we use an iterator to step through the two arrays. For each textarray value, we use the gsub method to replace all s with spaces. We add the sanitized textarray values and the untouched tossedarray values to a new string, outputstring. To finish it off, we write outputstring over the contents of the original file.
Here is the code:
require 'find' # to traverse the directory structure
def initialize(filetosearch) @thefile = File.open(filetosearch, "r") @thestring = @thefile.read() # Save the contents of the file as a string @thefile.close @thefile = File.open(filetosearch, "w") @thefile.print(self.ignoreUntouchable) @thefile.close end #initialize def ignoreUntouchable notInMe = %r(<p(?:.*?)class="codesample"(?:.*?)>(?:.*?)</p>) #matches codesample blocks eliminateMe = / / #matches non-breaking space characters @textarray = @thestring.split(notInMe) #array of strings that aren't codesamples @tossedarray = @thestring.scan(notInMe) #array of strings that are codesamples, should always have one fewer items than textarray @outputstring = "" for i in 0...@textarray.size if !@textarray[i].nil? then @textarray[i] = @textarray[i].gsub(eliminateMe, ' ') #replace anything matching eliminateMe with a space @outputstring << @textarray[i] #add it to the output string end #if if !@tossedarray[i].nil? then @outputstring << @tossedarray[i] #add untouchable chunk to the output string end end return @outputstring end #tokenizeUntouchable end #FileSearcher startDir = "C:\Documents and Settings\" #search this and all subdirectories fileType = ".htm" Find.find(startDir) do |thePath| if thePath.include? fileType print("Processing " << thePath << " ... ") FlareFixer.new(thePath) puts("Done!") end #if end