Removing Non-printing Characters from File Names
Topics:
- Introduction
- What is a "non-printing" character?
- How did a "non-printing" character get into my file name?
- Locating "non-printing" characters in file names
- Extirpating non-printing characters
- Example Cleanup Session
- Other problem characters in file names
- One last trick: od
- Preventing similar problems in the future
Introduction
It is possible for file names, like files, to contain control and other non-printing characters in addition to the alpha-numeric characters, punctuation marks, and other "printing" symbols used in the english language.
As a rule, such charaters in file names are unanticipated and create confusion for users, applications, and even system software.
Simply removing them using the "normal" set of Unix commands can be a challenge. Hence, the motivation for this document.
What is a "non-printing" character?
The table, below, shows the ASCII character set: 3-digit octal numbers (codes) followed by their interpretation or the printable symbol they represent. For instance, the letter A is encoded by the octal number 101 . Backspace, bs , is encoded by 010 .
Octal codes 000-037 and 177 encode non-printing characters.
ASCII Character Set, (codes given in octal)
000 nul
001 soh
002 stx
003 etx
004 eot
005 enq
006 ack
007 bel
010 bs
011 ht
012 nl
013 vt
014 np
015 cr
016 so
017 si
020 dle
021 dc1
022 dc2
023 dc3
024 dc4
025 nak
026 syn
027 etb
030 can
031 em
032 sub
033 esc
034 fs
035 gs
036 rs
037 us
040 sp
041 !
042 "
043 #
044 $
045 %
046 &
047 '
050 (
051 )
052 *
053 +
054 ,
055 -
056 .
057 /
060 0
061 1
062 2
063 3
064 4
065 5
066 6
067 7
070 8
071 9
072 :
073 ;
074 <
075 =
076 >
077 ?
100 @
101 A
102 B
103 C
104 D
105 E
106 F
107 G
110 H
111 I
112 J
113 K
114 L
115 M
116 N
117 O
120 P
121 Q
122 R
123 S
124 T
125 U
126 V
127 W
130 X
131 Y
132 Z
133 [
134 \
135 ]
136 ^
137 _
140 `
141 a
142 b
143 c
144 d
145 e
146 f
147 g
150 h
151 i
152 j
153 k
154 l
155 m
156 n
157 o
160 p
161 q
162 r
163 s
164 t
165 u
166 v
167 w
170 x
171 y
172 z
173 {
174
175 }
176 ~
177 del
How did a "non-printing" character get into my file name?
The backspace character (CONTROL-H, ^H, or \010) is quite common. It is usually introduced into file names, by accident, when your terminal is misconfigured and you use the backspace during file naming operations.
For instance, the file name,
aaa^H^H^Hzzzas intended by the user to be
zzz
It was created by typing "aaa," hitting the backspace 3 times in an attempt to delete the "aaa", and typing "zzz." (A Unix file copy command, such as, " cp a.out aaa^H^H^Hzzz " would do it.) Due to terminal misconfiguration, the code for the backspaces were included in the filename.
To make life interesting, the " ls " or " ls -l " command shows the file name as,
zzz
because the three backspaces are interpreted , "backspacing" over the "aaa" on the screen, so that only "zzz" appears. (As described below, "ls -b" or "ls -q" will display, and not interpret, control characters.)
Other unusual characters, like ^B , ^M , and ^] , make an occassional appearance in file names.
Locating "non-printing" characters in file names
The ls command has all the options you need to detect "hidden" characters:
| Command | Function | Sample Output |
|---|---|---|
| ls | hides/interprets nongraphic characters | zzz |
| ls -b | prints octal codes for nongraphic characters | zzz\010\010\010aaa |
| ls -q | prints question marks for nongraphic characters | zzz???aaa |
Extirpating non-printing characters
We recommend two approaches, both of which require care and caution. If you're unsure, please contact User Support for assistance.
-
Using
mv
and wild-card characters:
The " mv " command interprets question marks in file names as single position wild-card characters. " mv -i " inquires before overwriting any file. Thus, " ls -q " and " mv -i " become a convenient pair of commands.
Here's the approach to cleaning up a problem file name:
- " cd " to the directory.
- Type " ls -l " to show the file name with control characters hidden. Call this " desired-name ." (You might want a different name, which is fine.)
- Type " ls -lq " to show it with question marks in place of control characters. Call this " wild-card-name "
- To rename the file, type: mv -i wild-card-name desired-name
-
Using
find
and inode number:
This is the definitive method, guaranteed to work. It uses the find command which is powerful (i.e., "dangerous"), and unfamiliar to many. Thus, it's our second choice. Be careful, an accident with " find ... -exec ... " can quickly delete all your files.
Here's the approach:
- " cd " to the directory.
- Type " ls -l " to show the file name with control characters hidden. Call this " desired-name " (You might want a different name, which is fine.)
- Type " ls -il " to show it again, this time, with the file's inode number <INUM> . (Every file on the system has its own, unique inode number.)
-
Use the following find command, from within the directory in question, to rename the file:
find * \( -type d -prune \) -o -inum <INUM> -exec mv -i {} desired-name \;
Example Cleanup Session
chilkoot% cd /tmp/marybob chilkoot% ls -l total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 10:31 -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 zzz -rw------- 1 marybob wrkgroup 0 Sep 21filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 Directory chilkoot% ls -lq total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 10:31 ?????? -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 aaa???zzz -rw------- 1 marybob wrkgroup 0 Sep 21 11:00 ???????ba??filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 xxx???Directory chilkoot% mv -i xxx???Directory Directory chilkoot% mv -i ???????ba??filename filename chilkoot% mv -i aaa???zzz zzz chilkoot% rm -i ?????? cmd-3187 rm: r? yve chilkoot% ls -l total 32 -rw------- 1 marybob wrkgroup 0 Sep 21 11:00 filename drwx------ 2 marybob wrkgroup 4096 Sep 21 09:43 Directory -rw------- 1 marybob wrkgroup 0 Sep 21 09:43 zzz
Other problem characters in file names
Quoting file names containing graphic characters, like " : ", " ; ", " | ", and " ", helps commands resolve them. For instance, The following command changes a file name containing spaces:
chilkoot% mv -i "from a mac" from_a_mac
A leading "-" in a file name often appears to "rm" or "mv" as a command option. Prepending an explicit path to the file name removes this confusion. For example, the following command removes a file named "-i" from the present working directory ("./"), inquiring first:
chilkoot% rm -i ./-i
One last trick: od
It can be baffling when white-space occurs at the end of a filename. Even " ls -b ", for example, won't show the file name "space at end " :
chilkoot% ls -lq total 0 -rw------- 1 marybob wrkgroup 0 Jun 19 17:10 space at end chilkoot% mv "space at end" newname space at end - No such file or directory
The "octal dump" or " od " command can show you the exact octal codes comprising your file names, without without interpretation. Pipe the output of ls into od -b :
chilkoot% ls od -b 0000000 163 160 141 143 145 040 141 164 040 145 156 144 040 012 0000016
The last code, 012 is the new-line (added by ls ). The second to last code, 040 is the space. od -c interprets the codes, and also shows the trailing space:
chilkoot% ls od -c 0000000 s p a c e a t e n d \n 0000016
Renaming this file, now that the erstwhile hidden space has been revealed, is simple:
chilkoot% mv -i "space at end " nospace_at_end
Preventing similar problems in the future
-
If you're using
csh
, and the file names contain backspaces, marked
^H
, you probably need to define "erase". You could add the following to your
.login
file:
# # Sets erase character to ^H for terminal sessions, # doesn't do so for NQS session. # if (! \${?ENVIRONMENT}) then stty erase '^H' else if (BATCH != \$ENVIRONMENT) stty erase '^H' endif #r, type this every time you log on:stty erase '^H' - You're using modem software, and need to set the "BACKSPACE" option.
- Your TERM environment variable may be (or may have been) set incorrectly.
More Information
- Contact User Support if you encounter any problems.
