It is possible for file names, like files, to contain control and other non-printing characters in addition to the alpha-numeric characters, punctuation marks, and other "printing" symbols used in the english language.
As a rule, such charaters in file names are unanticipated and create confusion for users, applications, and even system software.
Simply removing them using the "normal" set of Unix commands can be a challenge. Hence, the motivation for this document.
The table, below, shows the ASCII character set: 3-digit octal numbers (codes) followed by their interpretation or the printable symbol they represent. For instance, the letter A is encoded by the octal number 101. Backspace, bs, is encoded by 010.
Octal codes 000-037 and 177 encode non-printing characters.
ASCII Character Set, (codes given in octal)
|000 nul |001 soh |002 stx |003 etx |004 eot |005 enq |006 ack |007 bel |
|010 bs |011 ht |012 nl |013 vt |014 np |015 cr |016 so |017 si |
|020 dle |021 dc1 |022 dc2 |023 dc3 |024 dc4 |025 nak |026 syn |027 etb |
|030 can |031 em |032 sub |033 esc |034 fs |035 gs |036 rs |037 us |
|040 sp |041 ! |042 " |043 # |044 $ |045 % |046 & |047 ' |
|050 ( |051 ) |052 * |053 + |054 , |055 - |056 . |057 / |
|060 0 |061 1 |062 2 |063 3 |064 4 |065 5 |066 6 |067 7 |
|070 8 |071 9 |072 : |073 ; |074 < |075 = |076 > |077 ? |
|100 @ |101 A |102 B |103 C |104 D |105 E |106 F |107 G |
|110 H |111 I |112 J |113 K |114 L |115 M |116 N |117 O |
|120 P |121 Q |122 R |123 S |124 T |125 U |126 V |127 W |
|130 X |131 Y |132 Z |133 [ |134 \ |135 ] |136 ^ |137 _ |
|140 ` |141 a |142 b |143 c |144 d |145 e |146 f |147 g |
|150 h |151 i |152 j |153 k |154 l |155 m |156 n |157 o |
|160 p |161 q |162 r |163 s |164 t |165 u |166 v |167 w |
|170 x |171 y |172 z |173 { |174 | |175 } |176 ~ |177 del |
The backspace character (CONTROL-H, ^H, or \010) is quite common. It is usually introduced into file names, by accident, when your terminal is misconfigured and you use the backspace during file naming operations.
For instance, the file name,
aaa^H^H^Hzzzwas intended by the user to be
zzz
It was created by typing "aaa," hitting the backspace 3 times in an attempt to delete the "aaa", and typing "zzz." (A Unix file copy command, such as, "cp a.out aaa^H^H^Hzzz" would do it.) Due to terminal misconfiguration, the code for the backspaces were included in the filename.
To make life interesting, the "ls" or "ls -l" command shows the file name as,
zzz
because the three backspaces are interpreted, "backspacing" over the "aaa" on the screen, so that only "zzz" appears. (As described below, "ls -b" or "ls -q" will display, and not interpret, control characters.)
Other unusual characters, like ^B, ^M, and ^], make an occassional appearance in file names.
The ls command has all the options you need to detect "hidden" characters:
| Command | Function | Sample Output |
|---|---|---|
| ls | hides/interprets nongraphic characters | zzz |
| ls -b | prints octal codes for nongraphic characters | zzz\010\010\010aaa |
| ls -q | prints question marks for nongraphic characters | zzz???aaa |
We recommend two approaches, both of which require care and caution. If you're unsure, please contact User Support for assistance.
The "mv" command interprets question marks in file names as single position wild-card characters. "mv -i" inquires before overwriting any file. Thus, "ls -q" and "mv -i" become a convenient pair of commands.
Here's the approach to cleaning up a problem file name:
This is the definitive method, guaranteed to work. It uses the find command which is powerful (i.e., "dangerous"), and unfamiliar to many. Thus, it's our second choice. Be careful, an accident with "find ... -exec ... " can quickly delete all your files.
Here's the approach:
find * \( -type d -prune \) -o -inum <INUM> -exec mv -i {} desired-name \;
chilkoot% cd /tmp/marybob chilkoot% ls -l total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 10:31 -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 zzz -rw------- 1 marybob wrkgroup 0 Sep 21filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 Directory chilkoot% ls -lq total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 10:31 ?????? -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 aaa???zzz -rw------- 1 marybob wrkgroup 0 Sep 21 11:00 ???????ba??filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 xxx???Directory chilkoot% mv -i xxx???Directory Directory chilkoot% mv -i ???????ba??filename filename chilkoot% mv -i aaa???zzz zzz chilkoot% rm -i ?????? cmd-3187 rm: r? yve chilkoot% ls -l total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 11:00 filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 Directory -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 zzz
Quoting file names containing graphic characters, like ":", ";", "|", and " ", helps commands resolve them. For instance, The following command changes a file name containing spaces:
chilkoot% mv -i "from a mac" from_a_mac
A leading "-" in a file name often appears to "rm" or "mv" as a command option. Prepending an explicit path to the file name removes this confusion. For example, the following command removes a file named "-i" from the present working directory ("./"), inquiring first:
chilkoot% rm -i ./-i
It can be baffling when white-space occurs at the end of a filename. Even "ls -b", for example, won't show the file name "space at end ":
chilkoot% ls -lq total 0 -rw------- 1 marybob wrkgroup 0 Jun 19 17:10 space at end chilkoot% mv "space at end" newname space at end - No such file or directory
The "octal dump" or "od" command can show you the exact octal codes comprising your file names, without without interpretation. Pipe the output of ls into od -b:
chilkoot% ls | od -b 0000000 163 160 141 143 145 040 141 164 040 145 156 144 040 012 0000016
The last code, 012 is the new-line (added by ls). The second to last code, 040 is the space. od -c interprets the codes, and also shows the trailing space:
chilkoot% ls | od -c 0000000 s p a c e a t e n d \n 0000016
Renaming this file, now that the erstwhile hidden space has been revealed, is simple:
chilkoot% mv -i "space at end " nospace_at_end
#
# Sets erase character to ^H for terminal sessions,
# doesn't do so for NQS session.
#
if (! \${?ENVIRONMENT}) then
stty erase '^H'
else
if (BATCH != \$ENVIRONMENT) stty erase '^H'
endif
#
Or, type this every time you log on:
stty erase '^H'
Arctic Region
Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:
home | search | about | support | news | science | resources