gopher.black

       The vi/ex Editor, Part 6: Addresses and Columns
       
         Screen-Mode Addresses
           A Few Address Principles
           Useful Addresses
         Editing in Columns
           Single-Character Columns
           Multi-Character Columns
         Next Installment
       
       By popular demand I'm trying something new in the tutorial,
       starting with this installment. The e-mail I receive from
       tutorial readers most often asks me how to do some specific type
       of editing job, using whatever editor tools are needed.  So, I'm
       now mixing my general-principle explanations with in-depth
       coverage of particular work areas.
       
       The first application area I'm covering is the one readers ask
       about most often, by far: editing files where columns are a major
       factor.  Future areas are up to you readers.  If you have an
       application area you'd like to see explained in some depth,
       e-mail me your suggestion.
       
       Screen-Mode Addresses
       
       You use them all the time.  They're the address targets that tell
       screen-mode commands like c d y which stretch of your file to act
       on.  And even more often you use such addresses without commands,
       to move around in the file.
       
       For starters, I'll tell you some basics of screen-mode addressing
       that aren't particularly clear to most editor users.  Then it's
       on to a few powerful but obscure addresses that most of us rarely
       or never use.
       
       A FEW ADDRESS PRINCIPLES
       
       The first fact of screen-mode range addresses is simple enough:
       one end of the range to be affected by the command is always
       marked by the cursor itself.  The address you give the command
       (always a single address) indicates where the other end of the
       affected range is to be. The address target can be either forward
       or backward from the cursor position, in most cases.  But exactly
       how the cursor and the target terminate the two ends of the range
       is variable.
       
       At the start we have to distinguish between line addresses and
       character addresses.  Line addresses are very straightforward:
       the command affects the entire line the cursor is on, the entire
       line where the address point is located, and all the lines in
       between.  If you are using an address without a command, in order
       to move the cursor, a line address generally puts the cursor on
       the first non-whitespace character in the line addressed.
       
       But line versus character addresses affect a lot more than
       exactly what's included in the range.  As one example, if you
       yank or delete text using a line address and then place that text
       somewhere with a p or P command, that text will appear on a new
       line or lines, above or below the line you are on, respectively.
       But if you yanked or deleted with a character address, when you
       put the text back in, it will appear within the line you are on,
       just just ahead of or behind the cursor.  And to dispose of one
       editor fallacy here and now, it does not make a bit of difference
       that the range of text you yanked or deleted with a character
       address amounts to exactly one or more lines -- it will still
       behave as any other text yanked or deleted with a character
       address.
       
       So which addresses are line addresses?  That depends on what your
       command is.
       
       Besides the three commands I cited as examples above, there are
       four other, less-used commands -- ! &lt; &gt; = -- that also take
       addresses.  The only thing you have to know right now about these
       four commands is that they can act only on entire lines; that's
       inherent in what they do.  So with these four commands, every
       address is a line address.  (Except a handful of addresses, such
       as "f", that cannot be used with these commands at all.)
       
       With the three more-used commands c d y or with an address used
       by itself to move the cursor, an individual address is either
       always a line address or always a character address -- usually.
       There are exceptions to this rule also, such as the address
       "j", which is a character address when you are just moving the
       cursor, but a line address to any command.
       
       So just where does a character address take you?  When you are
       just moving around in the file, the cursor lands on the character
       that is the target you sought.  Or if the target was a string of
       characters, the character address puts the cursor on the first of
       these.
       
       When you are using a character address with a command, the
       situation is more complex.  The one firm rule is that if the
       character address is farther down in the file than the cursor
       position, the cursor position is included in the range the
       command affects; while if the address target is earlier in the
       file than the cursor, the cursor position is not included in the
       range.
       
       The question of whether the address target is included in the
       command's range, like all the other open questions raised in the
       last few paragraphs, will have to be answered separately for each
       address. (But the usual rule is that if the address target is
       forward of the cursor, the target is not included; if the target
       lies backward from the cursor, the target is included.)
       
       Note also that a count given with any of these seven commands is
       passed to the address.  You may give the count before or after
       the command character itself, but always before the address.
       What the address does with the count, if anything, is also a
       case-by-case question.
       
       USEFUL ADDRESSES 
       
       There are four addresses  that together resemble a miniaturized,
       localized version of the / and ? search patterns. In each case,
       the search takes place only in the current line, and only for a
       single character.  To use any of them, you type one of the four
       letters designating the kind of inline search, immediately
       followed by the character to be searched for.  (There are no
       metacharacters used with these addresses.)
       
       The letter "f" means that the search will go forward in the
       current line and stop on the character typed next.  "F" makes
       the search run backward within the current line, otherwise the
       same as "f".  A "t" search is the same as an "f" search
       except that the search stops with the character just short of the
       one you type after the "t", and a "T" search is like a "t"
       search but running backward within the current line.  Any of
       these addresses can take a preceding count, which tells the
       search not to stop at the first instance of the character sought,
       but to go on to the nth, where n is the count.
       
       Any of these search commands,  including the repeat-search
       commands mentioned below, are character addresses and can be used
       as an address for any of the three range commands that does not
       require a line address.  In every case, the character on which
       the cursor would have landed had there been no command is the
       furthest character included in the range the command will affect.
       
       A few examples.  "Fp" would cause a search that went backward and
       landed on the closest prior letter "p".  "3f-" would make the
       search run forward within the current line and stop on the third
       instance of a hyphen.  "2T " would cause a backward search that
       ended one character short of the second closest space character.
       
       This search system has its own repeat-search characters, which
       use storage buffers completely independent of those used for
       storing previous / and ? search strings.  A semicolon ";" repeats
       the last inline search, in the same direction.  A comma ","
       repeats the last search but reverses the direction.  Any count to
       the original search is not included in the repeat, but you can
       give a count to either repeat character which will be passed to
       the search command that is repeated.  While a search is limited
       to the current line, you can run a search, move to another line,
       then use a semicolon or comma to repeat the original search on
       the new line.
       
       Another very useful address  that operates within a single line
       is the vertical bar "|".  When preceded by a count, this address
       takes the cursor to the nth character on the current line, where
       n is the count, regardless of where the cursor was when the
       address was given.  (In this address, n is absolute, not
       relative, starting from character one at the left edge of the
       text.)
       
       This address can also be used with a command.  If the target
       character position is forward from the cursor position, the
       furthest character affected will be the last one before the
       target character.  If the target is backward from the cursor, the
       target character as well as all those between it and the cursor
       will be affected by the command.  
       
       Editing in Columns
       
       Although the Vi/Ex editor was not specifically designed to deal
       with columnar material, there are ways to use it effectively for
       this kind of work.  Your choice of techniques will depend on
       whether you are dealing with single-character columns wherein
       each character in a line is in a separate column, or
       multi-character columns where the columns are set apart from each
       other by a separator character.
       
       SINGLE-CHARACTER COLUMNS
       
       Here I'm using "columns" the way most programmers do.  A column
       in this sense is simply the characters in a vertical section of a
       file, one character wide.  That is, the first character on each
       line of the file is in the first column, the second character of
       each line is in the second column, and so on.  You'll find this
       usage in systems that use punch-card images, such as early
       Fortran programs; in the blocked records in certain databases,
       such as the ones used for very large mailing lists; etcetera.
       
       The essential point is that the systems that use these records
       absolutely depend on each piece of information being entirely
       within a certain column or range of columns, and nothing else
       being within those columns except padding characters to fill up
       any column positions not needed for the information in a
       particular record.
       
       For example, a mailing list may require that a suite or apartment
       number be in columns 122 through 125 in each record (line), with
       any padding following the actual number, so that an address
       printing program that finds "316&nbsp;" in those columns will
       print ",&nbsp;#316" at the end of the street address line.  If it
       finds "3A&nbsp;&nbsp;" it will then print ",&nbsp;#3A", etcetera.
       Should the suite number be even partially shifted out of the
       designated columns, the system will either print garbage as the
       suite number or issue an error message and skip that address
       altogether.  The principle is the same, and even more important,
       with computer programs in punch-card image form.  
       
       When you are making changes in existing records, and editing
       visually, the first important point is to be sure your are at the
       start of the particular field you need to modify.  The "|"
       address I've explained above takes care of that -- wherever you
       are in a line, typing 122| brings the cursor to the 122nd column.
       Unless there are not 122 columns in that line: then the cursor
       will be placed in the last column that does exist, without any
       warning or error message.  But files of this sort have generally
       been checked for exact block sizing, and if yours have not been,
       it's easy to check visually.
       
       To check visually that all the lines in the file are of the
       proper length, start by running a :se list command, which will
       display a dollar sign at the end of each file line. Then scan
       through the file to check that all those dollar signs are aligned
       vertically.  If so, then check that the uniform line length is
       the correct one -- if your line length should be 66 characters
       (not counting the nonvisible newline), then run a 65| command on
       any line, and make sure that the cursor lands one column away
       from the end of the line.
       
       When you are at the start of the field to be changed, you have a
       choice of ways to change it.  If the change area is 12 characters
       long, then typing 12cl followed by the 12 new characters and then
       the escape key will do it.  But if you miss the count by even one
       character; if the actual number of characters you type in is 11
       or 13; then all the subsequent fields on that line will be
       shifted one character out of place, which is probably a recipe
       for disaster.
       
       To avoid this hazard, make use of the little-known R command.  It
       starts like the familiar r command, in that when you type the
       letter "R" in visual command mode the system waits to see what
       character you type next, and whatever that next character is, it
       replaces the character that was under the cursor.  But instead of
       then returning you to command mode, the R command then moves the
       cursor one character to the right and again waits to see what
       character you type next -- the character you now type replaces
       the character that is now under the cursor.  This process
       continues until you stop it by hitting the escape key.  So if
       your cursor is on the capital P in the following line:
       
         but the greatest ancient Greek was Plato, who
       
       and you type in "RHomer" followed by the escape key, your line will
       now read:
       
         but the greatest ancient Greek was Homer, who
       
       and the cursor will be on the letter r at the end of "Homer".
       This character at a time replacement is the way to make sure you
       don't inadvertently shift any fields.  Just be certain that you
       don't keep typing replacement characters beyond the existing end
       of the line; you would extend the line length that way.  You can
       give a count to the R command, but you don't want to in this use
       because the count will multiply the number of times the new
       character string is inserted.  That is, in that example above
       about replacing "Plato" with "Homer", if you had typed 3R instead
       of R your revised line would read:
       
         but the greatest ancient Greek was HomerHomerHomer, who
       
       Entering completely new lines of information is another matter.
       You should just type them straight across, as you would with any
       text entry, but if the existing lines are cryptic to human eyes
       you may not be able to tell by looking just where one field ends
       and another begins. You can try to keep count of the characters,
       of course, but a single mistake will throw all the subsequent
       fields in that line out of position.
       
       What you need here is an on-screen template to show you what goes
       where.  You can make one on the spot, just by typing a template
       line into your file, entering each data line just above it, and
       deleting that template line when you are finished adding lines.
       For example, suppose you are adding to a name file where each
       record (line) starts with a month, day and year, continues with a
       source code (each of the preceding as a two-digit number, with a
       leading zero to pad it if necessary), and then has fields for a
       last name, first name, and middle initial. It would not be
       practical to judge where fields break just by looking at the
       existing data lines, which might look like this: 
       
         07215854von TarekenstuttLeopold  J
         12077338Henderson-Blyth La Toya  P
         10108972Thistlethwaites Geraldine
       
       But a simple template line can clear it all up.  Here is one for the
       job above: 
       
         m|d|y|s|LLLLLLLLLLLLLLL|FFFFFFFF|M
       
       It has mnemonic characters to remind you of what goes in each
       field, and the "|" to indicate the last position of each field
       more noticeably. I've even used a lower-case letter for each
       field that takes numeric characters right justified and zero
       padded, and a capital letter for each field that takes alpha
       characters left justified and space padded.  
       
       The way to use this template is to start entering data lines
       immediately above the template line.  That way, as you hit return
       to start a new line, that new line replaces the one you've just
       finished in the position right above the template line.  Yes,
       eventually the template line will be driven down off the bottom
       of the screen, but returning to command mode and typing the
       lower-case letter "z" followed by the return key will move the
       template line and the lines around it to the top of the screen.
       
       But there will be times when you don't want to spend time making
       individual changes that you should be able to handle globally.
       Suppose an obsolescent operations code has been replaced, and you
       now need to change every "B27" to "K53" throughout your file, but
       only when the "B27" appears in the operations code columns, which
       are columns 9 through 11. Th is odd-looking command will do it:
       
         :%s/^\(........\)B27/\1K53
       
       Those eight consecutive dots in the search pattern guarantee that
       a match will occur only when there are exactly eight characters
       between the beginning of the line and the "B27".  So of
       necessity, the "B" must occur in column 9, and so on.  The "\1"
       puts those eight characters right back in again, so only the
       "B27" is actually replaced.
       
       If your columnar file has all lines of equal length, as most do,
       you can use this technique from the right side, too.  If all
       lines in the file have 66 characters, then typing that last
       command as:
       
         :%s/B27\(...\)$/K53\1
       
       will accomplish the changes in a case where the operations code
       columns are 61 through 63, without the need to type (and
       carefully count) sixty consecutive dots.
       
       But there will be times when the columns to be changed are in the
       middle of horrendously long record lines.  There are still a
       couple of tricks you may be able to use.  One is to find a
       landmark somewhere in mid-line.  Does column 158 always contain
       either a "*" or a "|" character, neither of which can appear
       anywhere else in the lines?  Then you can make the above change
       in columns 163 through 165 by typing:
       
         :%s/\([*|]....\)B27/\1K53
       
       Failing a landmark, let the editor count out a long string of
       dots for you.  To use this technique, you must first create your
       substitution command as a text line within the file you are
       editing, next write that line as a separate file (and then delete
       the command line from your original file), and finally use the
       :so command to pull in that one-line file and run it as a
       line-mode command. If you need a string of 92 consecutive dots in
       your command, create a blank line at the end of your file, next
       type:
       
         :1,92g/^/$s/^/.
       
       to put those 92 dots there, and finally put the rest of the
       command around that dot string.
       
       MULTI-CHARACTER COLUMNS
       
       The other meaning of "editing in columns" has to do with text
       rather than data files.  It refers to tables of data such as you
       might find accompanying a technical article, columns of text
       and/or illustrations running in parallel as you'd find on a
       newspaper page, and the like.
       
       Yes, Unix formatting utilities and some word processing programs
       will format your final output into columns.  But you may not have
       all these utilities, you may not want to spend time trying to get
       the results you want from those benighted programs, or you may
       plan to direct your output where formatters won't work.  
       
       Visually editing the columns of data in a table requires little
       explanation.  The one thing to remember: use the R as far as
       possible, to avoid shifting subsequent columns out of alignment
       inadvertently.  This holds for creating tables, too; start by
       setting up a rectangular block of space characters, then replace
       spaces with the column entries you want, to keep your next entry
       from misaligning previous ones.  This is also the best way to
       create pictures, diagrams, graphs and maps using ASCII
       characters.
       
       Things become problematic when you want to shift whole columns
       around -- there are no built-in Vi facilities for doing this.
       Here is what it is practical to do in the editor.  As a real life
       example, consider the piece below, which I use as the tail end of
       Usenet (Net news) posts that announce Indonesian classical music
       and dance performances at a local restaurant:
       
       It's at the Dutch East Indies    ;,,,,;,,,,;,,,,;,,,,;
       Restaurant       on Oakland's   /%%%%%%%%%%%%%%%%%%%%%\
       downtown waterfront.  The      /%%%%%%%%%%%%%%%%%%%%%%%\
       food    there is very good      "|""|"""|"""""|"""|""|"
       Indonesian cuisine at           _|__|___|_   _|___|__|_
       reasonable prices - dinners     =|==|===|=====|===|==|=
       $8.95 to $17.50.   Views are   ~~~~~~~~~~~( (~~~~~~~~~~~
       spectacular from the second                ) )
       floor    picture windows, out
       over the water to Jack London Square, Alameda and San Francisco.
       Formality is medium - cloth napkins and oil    candles at the
       tables, but no supercilious waiters, and the wall decorations are
       mostly Indonesian handicrafts. The phone number for information
       and reservations is 510/444-6555.
       
        ( ( (             | Broadway ||I   The Dutch East Indies
         ) ) )Jack London |==========||==  Restaurant is in Jack London
        ( ( ( Square      |E         ||8   Village, a boutiques &
         ) ) )            |m         ||8   bistros cluster that is just
        ( ( ( JACK LONDON |b         ||0   down the estuary from Jack
         ) ) )VILLAGE     |a         ||    London Square.  Jack London
        ( ( (        Alice|r Amtrak  ||f   Village is rustic,
         ) ) ) -----------|c station ||r   picturesque, quiet and safe.
        ( ( (       Street|a         ||e   To get there from the
         ) ) )            |d  Jackson||e   Interstate 880 freeway
        ( ( ( parking lot |e   ------||--  heading north, take the Oak
         ) ) )            |r   Street||w   Street exit and turn left;
        ( ( (             |o         ||a   five blocks will bring you
         ) ) )            |          ||y   to Embarca- dero on your
        ( ( (           -------------||--  right, just before Oak
         ) ) )             Oak Street||    curves away to the left.
       
       (Going south on I-880, take the Jackson Street exit and go two
       blocks straight ahead before you turn right on Oak Street.)  Turn
       right onto Embarcadero and go three blocks, until you go under an
       overpass of Victorian ironwork. Immediately turn left onto Alice
       Street, where you will see Jack London Village on your right, and
       a large lot that offers validated parking on the left. Walk into
       the Village's central courtyard, and you'll see the Dutch East
       Indies on the estuary side, toward the right, and upstairs.
       
       To create this, I started by drawing the stylized building and
       then the map.  In each case I created a large rectangular block
       of space characters, then began trying ideas with the R command
       until I had something that satisfied me.  (The pavilion sketch
       eventually became wider than I had planned, so I had to run a
       :%s/.*/   &   / command to give me more working space.)  Next I
       put additional blocks of space characters on the left of the
       drawing and the right of the map, to make a place for the text I
       wanted to include.  Then I started replacing spaces with text,
       rewriting the text as I went along to fit it in nicely. When the
       text reached the bottom of the figure I was fitting it to, I went
       to full-width text lines, entering them the usual way.  A tedious
       labor, but pretty straightforward.
       
       Now suppose I decided to redo this piece, by moving the picture
       to where the map is now, and vice versa.  A few well chosen
       substitution and deletion commands would make copies of the two
       figures minus the text, and I could just as easily copy the text
       without the two figures. But how would I recombine them?
       
       Short of typing the text in again from scratch, the best I could
       do is to yank the lines of each figure, one at a time, and put
       them after (or before) the appropriate text lines, one at a time.
       Not that I would have to move back and forth between files with
       each yank and put; I could yank up to 26 lines into the named
       buffers, then move to the other file and put all 26 in their
       proper places.  But there is no Vi command to yank a rectangular
       block of characters.
       
       Also take note that I should yank using addresses that are not
       line addresses, even though I will be yanking whole lines.  If I
       should yank with line addresses, putting the pieces into the
       other file must make those pieces separate lines -- then I would
       have to join each pair of lines to create the columns I want.
       
       Next Time Around
       
       In the next part of this tutorial, I will go over host of
       complications and opportunities that come from allowing the
       replacement commands I've discussed to use metacharacters. Then
       I'll answer a couple of questions from readers that should be of
       use to quite a few of you from time to time.
       
 (DIR) Part 7: The Replacement Commands
 (DIR) Back to the index