Recursive Regular Expressions, part 2
October 16, 2009
Here is a generative grammar for arithmetic expressions taken from the Dragon Book p. 198:
E → E + T E → E - T E → T T → T * F T → T / F T → F F → ( E ) F → number
We can match its output with a Ruby 1.9 regular expression:
/^(?<E>(?<T>(?<F>([1-9]\d*|0)|\(\g<E>\))((\*|\/)\g<T>)*)((\+|\-)\g<E>)*)$/
For numbers, the regular expression matches non-negative integers.
The generative grammar is written to produce a rightmost derivation, but the regular expression must be written as if it were producing a leftmost derivation:
$ irb
irb(main):001:0> /(?<S>(\g<S>\+)*\d+)/
SyntaxError: (irb):1: never ending recursion: /(?<S>(\g<S>\+)*\d+)/
from /usr/local/bin/irb:12:in `<main>'
irb(main):002:0> /(?<S>\d+(\+\g<S>)*)/
=> /(?<S>\d+(\+\g<S>)*)/
irb(main):003:0> /(?<S>\d+(\+\g<S>)*)/.match("1+2+3")
=> #<MatchData "1+2+3" S:"1+2+3">
Recursive Regular Expressions
October 15, 2009
A formal regular expression is built from characters and three operations: concatenation, alternation, and the Kleene star. A formal regular expression can’t distinguish strings with balanced parentheses from those with unbalanced parentheses. Also it can’t match the set
{ aⁿbⁿ | n ≥ 1 }
where aⁿ is n occurrences of a. This follows from the equivalence of formal regular expressions and deterministic finite automata, a result shown in a 1956 paper by S.C. Kleene.
Some languages have regular expression extensions which make the above set matchable. In perl 5.8.8 we can do the following:
$ perl -e ' $re = qr/a(??{$re})*b/; print "matches\n" if "ab" =~ /^($re)$/ '
matches
$ perl -e ' $re = qr/a(??{$re})*b/; print "matches\n" if "aaaabbbb" =~ /^($re)$/ '
matches
$ perl -e ' $re = qr/a(??{$re})*b/; print "matches\n" if "aaaab" =~ /^($re)$/ '
$
In Ruby 1.9 we can do this:
$ irb irb(main):001:0> "ab".match(/^(?<c>a(\g<c>)*b)$/) => #<MatchData "ab" c:"ab"> irb(main):002:0> "aaaabbbb".match(/^(?<c>a(\g<c>)*b)$/) => #<MatchData "aaaabbbb" c:"aaaabbbb"> irb(main):003:0> "aaaab".match(/^(?<c>a(\g<c>)*b)$/) => nil
This feature isn’t available in Ruby 1.8, even though the Oniguruma regexp engine added it in 2003.
Python allows portions of the regular expression to be named and referenced, but as of 2.5.1 this cannot be done recursively:
$ python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.match("^(?P<c>a(?P=c)*b)$", "ab")
<_sre.SRE_Match object at 0x787e0>
>>> re.match("^(?P<c>a(?P=c)*b)$", "aaaabbbb")
>>> re.match("^(?P<c>ab)(?P=c)*$", "ababab")
<_sre.SRE_Match object at 0x788a0>
Python 3.1.1 also appears to lack recursion.
Class Privacy and Object Privacy
October 1, 2009
C++ has class privacy, not object privacy:
$ cat private.cc
#include <stdio.h>
class A {
private:
void privy() { printf("hello\n"); }
public:
void pubby(A *a) { a->privy(); }
};
int
main(int argc, char **argv) {
A *a1 = new A;
A *a2 = new A;
a2->pubby(a1);
}
$ g++ private.cc
$ ./a.out
hello
Ruby has object privacy, not class privacy:
$ cat private.rb
class A
private
def privy
puts "hello"
end
public
def pubby(other)
other.privy
end
end
a1 = A.new
a2 = A.new
a2.pubby(a1)
$ ruby private.rb
private.rb:9:in `pubby': private method `privy' called for #<A:0x3e664c> (NoMethodError)
from private.rb:15:in `<main>'
Indian Creek
September 24, 2009
We arrived in Moab Tuesday afternoon. There was fresh snow on the La Sals, and the forecast called for thunderstorms, so we delayed going down to the Creek. Wednesday we repeated a couple of climbs along Potash road, bouldered at Big Bend, and then bought groceries for camping at the Creek.
We arrived in the Creek after dark, and we discovered that camping is no longer permitted at Newspaper Rock. Always a bad idea because the canyon is narrow and there is a danger of flash floods. The sandy, vegetation-free soil of the campground should have been a clue, but it was nice camping under the cottonwoods.
We camped at the Needles Outpost $25 a night. Showers are available. We met Tracy Napolitano. Thursday and Friday we climbed on Scarface Buttress. Here is a list of climbs I did. Of the climbs we did Thurs and Fri, I like these the most (2nd ed. numbering): #9 thin hands, face holds, and lieback, #34 wide hands and chimney, #12 Scarface. The guidebook is great. It amazes me that when the Wiggins party climbed Supercrack in 1976, the road into Moab was dirt and a wooden bridge crossed the Colorado. When were highways 191 and 211 paved?
Friday night we returned to Moab. Thousands were in town for a bike race, so we couldn’t get a hotel. We got a campsite in the Moab RV park near City Market $20 a night. The first night was marred by the drunken hillbillies next to us. The man kept calling the woman a stupid cunt. Apparently she turned his dog against him.
Saturday we climbed at Potash Road again. I did “Nervous in Suburbia” 10a, “Lucy in Sky over Potash” 10a, and “30 Seconds Over Potash” 5.8. A couple of locals climbed next to us, and they referred to the woman making the odd calls from the other side of the Colorado “goose lady”.
After climbing we drove the bike race route in reverse, going down river road and then up Castle Valley into the La Sals. We discovered Mill Creek, and saw where people park and hike down to the climbing. It is obvious the climbing is on Entrada sandstone. I asked someone at Pagan about it, and there are about 100 routes there. No guidebook, so just choose something you think you might be able to climb. It isn’t all high-end climbing: there is stuff in the 5.10 and 5.11 range.
I studied the map and consulted a hiking guide in the bookstore. If you were interested in making an ascent of Mt Peale (12721 ft elevation, 6200 ft of prominence, 23rd most prominent peak in the lower 48), you would take highway 46 east from La Sal Junction, and then get on Upper Two Mile Rd to La Sal Pass Rd. The latter two roads are dirt, but passable with 2WD. Drive up to the vicinity of La Sal Pass, a col which separates the south La Sals from the central La Sals. Mt Peale is an off trail route straight up to the north.
Sunday we went skydiving. When we walked back to the hangar, Steph Davis was sitting on her tailgate in her wingsuit.
We returned to the Creek and camped on the Bridger Jack Shelf Sunday evening. A thunderstorm came through and doused everything. We cooked pork chops, rotini pasta, and broccoli. A large praying mantis visited our campsite and I can’t recall ever watching one of these insects up close.
Monday we climbed again on Scarface Buttress. Climb #10 to the left of “Where’s Carruthers” is superb. It is 100′ long, and it starts out with thin hands and stemming in a pair of cracks. The upper section has cruxy liebacks between good rests. I protected in both cracks without using runners, which caused the 0.75 Camalot to walk deep into the crack. I spent over 15 minutes retrieving it by using nut tools and a sling to reach the release bar.
Character Palette vs Character Map
September 13, 2009
The GUI utility for poking special characters into a document is called Character Palette on Mac. On Windows it is called Character Map.
Mac Character Palette is better. It supports characters outside the Basic Multilingual Plane. Windows Character Map apparently uses UCS-2 internally. Even within the BMP, the fonts that Mac has installed by default cover more Unicode points. Mac will display all the glyphs for a selected Unicode point in the Font Variation drop down.
Windows, in contrast, makes the user choose a font, and then shows the supported points for the font and their glyphs. If the point isn’t supported, the user must try a different font. Each time the font is changed, the range of points displayed is reset to 0000-00A9. Thus looking through the fonts is tedious, especially if the user doesn’t know the Unicode point of the desired character.
Character Palette makes browsing easier by displaying the Unicode point numbers in the margins. It also organizes the Unicode characters into named sections, and displays the section names. Character Map will also do this if the value in the Group By box is set to Unicode Subrange, in which case the section names appear in a separate window.
Windows Character Map displays just 10 x 16 characters at a time. Character Palette is resizable, but can easily display 12 x 16 characters.
Mac can keep frequently used characters in a list of favorites.

The Mac Character Palette is an always-on-top window, which I find irritating. When inserting characters, they go into the window with focus, but it is easy to forget which window has focus when the Character Palette is covering it up.
Windows Character Map does not insert characters directly into other applications. Instead it uses the system clipboard, adding an extra keystroke to the process. However, several characters can be selected and added to the clipboard together.

Versions used: Mac OS X 10.5 and Windows Vista
Cuneiform Ruby
July 3, 2009
I’m guessing that you don’t have a cuneiform font installed. Install one now.
Cuneiform characters were officially assigned Unicode points in 2006. None of these points are in the Basic Multilingual Plane. A lot of software which claims to handle Unicode cannot handle points outside of the BMP. For example, before version 1.5 java assumed all characters could fit into 16 bits. So cuneiform is a good test of the system.
Ruby 1.9 extends the \u notation for entering Unicode points in strings:
bash$ irb
irb(main):001:0> puts "\u{1204C}"
𒁌
Better yet, use cuneiform as a variable
bash$ cat cuneiform.rb # encoding: utf-8 𒀻 = 13 puts 𒀻 bash$ ruby cuneiform.rb 13
I had better luck with vi than emacs editing the file. There were display issues. Beacuse 𒀻 is quite wide for a single character and Terminal is fixed width, the cuneiform character was overlapping the characters that follow it.
bash$ cat cuneiform.rb
# encoding: utf-8
class 𒀽
def puts
puts "I am cuneiform: \u{1203D}"
end
end
c = 𒀽.new
c.puts
bash$ ruby cuneiform.rb
cuneiform.rb:3: class/module name must be CONSTANT
So this tells us that Ruby considers the character to be lower case. Ruby doesn’t appear to recognize any non-ASCII capital letters in source code:
bash$ cat greek.rb
#encoding: utf-8
class Λ
def initialize(λ)
@λ = λ
end
def apply(*a)
@λ.call(*a)
end
end
λ = Λ.new(lambda { |x,y| x+y })
puts λ.apply(3,7)
bash$ ruby greek.rb
greek.rb:2: class/module name must be CONSTANT
Ruby is aware of capitals in strings:
irb(main):010:0> s = "\u039B"; puts s; /\p{Upper}/.match(s)
Λ
=> #<MatchData "Λ">
irb(main):011:0> s = "\u03BB"; puts s; /\p{Upper}/.match(s)
λ
=> nil
irb(main):012:0> s = "\u{12040}"; puts s; /\p{Upper}/.match(s)
𒁀
=> nil
We can use Unicode in symbols:
# encoding: utf-8
alias :λ :lambda
add = λ { |x,y| x + y }
puts add.call(3,11)
Kindle 2 Thoughts
June 28, 2009
I like this device. I’d like to get rid of my four shelves of books and rely on this thing. Unfortunately, we’re not quite there yet, since many of the books aren’t available as Kindle editions. The replacement cost would be high, though if I paid for my Kindle books as I needed them, it might not be too bad. Many of my books I probably won’t use again.
Some areas for improvement in order of declining importance:
availability and quality of Kindle books:
Lots of books still not available. Some Kindle books have quality issues. Some lack a table of contents with links, and left and right on the 5-way controller, which is supposed to move by chapter or section, doesn’t always work. I see books with bits like “procrast- ination”, which suggest some sort of problem in the conversion to Kindle process. Downloading a sample of the book first or reading reviews can help avoid these bad books, but it might not be a big deal since the bad books are usually the dollar books.
navigation:
Should be able to execute commands in the menu with one or two keystrokes. Droppping the menu and navigating to the desired action with the 5 way can easily be 5 or more keystrokes (counting the 5 way as a key). Also, sometimes a sentence or paragraph is broken by the page break, and it is desirable to quickly see the whole. I would like to be able to reposition the page by one location with a single stroke. Best I can do now is use the goto location in the menu. There is some weirdness I don’t understand about the goto location, because sometimes the location provided doesn’t become the beginning of the page. Are not all locations positionable at the top of the page? Also, if you page backward, sometimes it takes more page forwards to return to the original position, which is disorienting.
screen refresh:
Too slow, affects navigation, both when paging and moving the cursor. Is this a limitation of the screen technology or the speed of the processor?
symbols
Using the symbol palette that pops up when you hit the SYM key stinks. First of all, there aren’t enough symbols. When making a note on a mathematical text, I would like to have Greek letters. Also, when using the 5 way, I find it too easy to push in instead of push left, which results in the wrong symbol selected. Not sure what the solution is. Have every symbol in the table numbered, so it can be selected with one or two keystrokes? Or add symbols to the number keys like a regular keyboard (plenty of space above the keyboard)?
internationalization:
There are some foreign language texts, and accented Latin displays correctly, but no foreign language dictionaries. Ideally, I would like to set two dictionaries simultaneously when reading a foreign text, to have for example a Spanish dictionary and a Spanish-to-English dictionary.
pdf:
Sent a PDF to amazon to have it converted and downloaded. It was a challenging document, 3M of images scans of a 1937 master’s thesis. Amazon chopped it up into pages. One page would be the best guess of the text in the image, and the next would be the image. Document size on the kindle didn’t increase much, and the process took about 5-10 minutes and cost $0.75. Although the text was ugly and had lots of errors and was not completely necessary because the image quality was readable, I see the value of including it for search. Unfortunately, I didn’t change the file name when I attached it in the email, and ended up with 344541425.pdf as the document name on my home screen. Could not find out how to change it. Not even hooking up my laptop via USB and editing the file name.
web, mp3:
The browser could be improved. The external speakers make crackle noise. I haven’t tried plugging in headphones.
other:
One can image other improvements: color screen, backlit screen. These may add more cost than value. I hanker for a stylus like the old palm pilot, for taking notes, or even doing the exercises in a technical work, so I don’t have to have an extra notebook and pen. Seems like there is room for increasing the screen size without increasing the overall size of the device: about an inch is wasted horizontally.
gcc and the mac
June 20, 2009
Re-acquainting myself with gcc. Since the man page is extensive, I summarize some useful options.
-c : compile and assemble, don’t link
-S : compile, don’t assemble
-E : preprocess, don’t compile
-o FILE : create target named FILE
-v : print version number. Output commands used during compilation
-Wall : highest warning level
-w : turn off warning
-g : add debugging info to object (for gdb)
-p : add profiling info to object (for prof)
-pg : add profiling info to object (for gprof)
-Q : output compiler stats for each function
-O,-O1 : optimize
-O2 : optimize more
-O3 : highest optimization level
-fast : optimize for speed
-Os : optimize for size
-D NAME : for preprocessor, set macro NAME to 1
-D NAME=VALUE: set macro NAME to VALUE
-U NAME : undefine macro NAME
-I DIR : add DIR to include search path
-L IDR : add DIR to library search path
-M : output make dependencies
-l LIB : search libLIB for symbol definitions (linker)
Some oddities peculiar to the mac that I’m not used to. Here’s how to create a shared object:
gcc -c stack.c gcc -dynamiclib -o libstack.dylib -dylib stack.o
Also, the mac version of gcc appends an underscore to symbols
bash $ cat main.c
#include <stdio.h>
void
hello_world() {
printf("hello world\n");
}
int
main(int argc, char **argv) {
hello_world();
}
bash $ gcc main.c
bash $ ./a.out
hello world
bash $ nm a.out
0000200c D _NXArgc
00002008 D _NXArgv
00002000 D ___progname
00001fb4 t __dyld_func_lookup
00001000 A __mh_execute_header
00002004 D _environ
U _exit
00001fc2 T _hello_world
00001fe3 T _main
U _puts
00002010 d dyld__mach_header
00001fa0 t dyld_stub_binding_helper
00001f60 T start
Thought Systems Bundled as Seven Imperatives
June 14, 2009
A common example of this is Covey’s set:
- be proactive
- begin with the end in mind
- first things first
- think win-win
- first understand, then be understood
- synergize
- sharpen the saw
Lean software development is also septofascicular:
- deliver fast
- decide late
- eliminate waste
- empower the employee
- amplify learning
- build in integrity
- see the whole
The Rock Warrior’s Way organizes itself into seven principles, but Ilgner doesn’t distill them out as mnemonic imperatives. So I attempt my own formulations:
- become self-aware
- see subtlety
- accept responsibility
- ask what you can give
- commit with unbending intent
- don’t expect, listen
- value the journey, not the destination
Shortcut Keys
June 13, 2009
Note to self: these shortcut keys are useful. Practice using them.
mac
I set the dock to hide itself. I set the number of spaces to two, so that I am never searching for a window. If I don’t see it, it’s in the other space. Also, both arrow keys take me to the other space, so I don’t need to remember where I am. Also just learned about pbcopy and pbpaste.
cmd-del move selection to trash cmd-rarrow move between spaces (click and hold on window to carry it with you) cmd-larrow move between spaces cmd-E eject cmd-W close window cmd-Q quit app cmd-+ adjust magnification cmd-- F11 expose (at least that's where I want it) cmd-M minimize window cmd-TAB change app
emacs
Don’t exit emacs to prowl the filesystem or run a command. Wasteful.
M-! execute external cmd M-x shell shell buffer C-c C-c send C-c (shell mode) C-c C-d send C-d (shell mode) C-c C-z send C-z (shell mode) C-x C-b display buffer list M-x sort-lines M-x reverse-region ^ visit parent directory (dired) d mark for delete (dired) i display contents of subdir at bottom of same buffer (dired) o open file in different window (C-o doesn't change focus) m mark (dired) u unmark (dired) x expunge files marked for delete (dired) # mark auto-save files for deletion (dired) ~ mark emacs backup files for deletion (dired) D delete marked files (dired) R rename marked files (multiple files are moved into dir) C copy marked files + make a subdir (dired) $ hide subdir (dired) M-$ hide all subdirs (dired) M-x find-name-dired M-x find-grep-dired M-x find-grep M-x wdired-change-to-wdired-mode rename and move files by editing the dired buffer
ssh, screen
Remember to ssh user@host instead of ssh host followed by su user. Use ssh -X to launch X windows, and then emacs can be in its own window. Also, you pay for that up front laziness: how to set up ssh keys.
Basic screen procedure: run screen, run the comand, then detach with C-a d. Reattach with screen -r. If multiple screen sesssions, specify pid and terminal. Screen is unkind to emacs users because of the C-a binding but that is all you need. Stop using nohup.
firefox
installed “It’s All Text!” and reinstalled Firebug and YSlow. For the record the other addons I find useful: Live HTTP Headers, Web Developer, google and del.icio.us toobars.
websites
got a github account. Pastie at gist.github.com
also look into
- running svn commands from within emacs
- keep some standard dot files .bashrc .irbrc .emacs at github
- install these emacs files
- color highlighting in emacs not working on linux