Removing Non-printing Characters from File Names

Introduction

It is possible for file names, like files, to contain control and other non-printing characters in addition to the alpha-numeric characters, punctuation marks, and other "printing" symbols used in the english language.

As a rule, such charaters in file names are unanticipated and create confusion for users, applications, and even system software.

Simply removing them using the "normal" set of Unix commands can be a challenge. Hence, the motivation for this document.

What is a "non-printing" character?

The table, below, shows the ASCII character set: 3-digit octal numbers (codes) followed by their interpretation or the printable symbol they represent. For instance, the letter A is encoded by the octal number 101 . Backspace, bs , is encoded by 010 .

Octal codes 000-037 and 177 encode non-printing characters.


ASCII Character Set, (codes given in octal)

  
000 nul 
001 soh 
002 stx 
003 etx 
004 eot 
005 enq 
006 ack 
007 bel 

  
010 bs  
011 ht  
012 nl  
013 vt  
014 np  
015 cr  
016 so  
017 si  

  
020 dle 
021 dc1 
022 dc2 
023 dc3 
024 dc4 
025 nak 
026 syn 
027 etb 

  
030 can 
031 em  
032 sub 
033 esc 
034 fs  
035 gs  
036 rs  
037 us  

  
040 sp  
041 !   
042 "   
043 #   
044 $   
045 %   
046 &   
047 '   

  
050 (   
051 )   
052 *   
053 +   
054 ,   
055 -   
056 .   
057 /   

  
060 0   
061 1   
062 2   
063 3   
064 4   
065 5   
066 6   
067 7   

  
070 8   
071 9   
072 :   
073 ;   
074 <   
075 =   
076 >   
077 ?   

  
100 @   
101 A   
102 B   
103 C   
104 D   
105 E   
106 F   
107 G   

  
110 H   
111 I   
112 J   
113 K   
114 L   
115 M   
116 N   
117 O   

  
120 P   
121 Q   
122 R   
123 S   
124 T   
125 U   
126 V   
127 W   

  
130 X   
131 Y   
132 Z   
133 [   
134 \   
135 ]   
136 ^   
137 _   

  
140 `   
141 a   
142 b   
143 c   
144 d   
145 e   
146 f   
147 g   

  
150 h   
151 i   
152 j   
153 k   
154 l   
155 m   
156 n   
157 o   

  
160 p   
161 q   
162 r   
163 s   
164 t   
165 u   
166 v   
167 w   

  
170 x   
171 y   
172 z   
173 {   
174 
   
175 }   
176 ~   
177 del 

How did a "non-printing" character get into my file name?

The backspace character (CONTROL-H, ^H, or \010) is quite common. It is usually introduced into file names, by accident, when your terminal is misconfigured and you use the backspace during file naming operations.

For instance, the file name,


  aaa^H^H^Hzzz
as intended by the user to be

  zzz

It was created by typing "aaa," hitting the backspace 3 times in an attempt to delete the "aaa", and typing "zzz." (A Unix file copy command, such as, " cp a.out aaa^H^H^Hzzz " would do it.) Due to terminal misconfiguration, the code for the backspaces were included in the filename.

To make life interesting, the " ls " or " ls -l " command shows the file name as,


  zzz

because the three backspaces are interpreted , "backspacing" over the "aaa" on the screen, so that only "zzz" appears. (As described below, "ls -b" or "ls -q" will display, and not interpret, control characters.)

Other unusual characters, like ^B , ^M , and ^] , make an occassional appearance in file names.

Locating "non-printing" characters in file names

The ls command has all the options you need to detect "hidden" characters:

Command Function Sample Output
ls hides/interprets nongraphic characters zzz
ls -b prints octal codes for nongraphic characters zzz\010\010\010aaa
ls -q prints question marks for nongraphic characters zzz???aaa

Extirpating non-printing characters

We recommend two approaches, both of which require care and caution. If you're unsure, please contact User Support for assistance.

  1. Using mv and wild-card characters:

    The " mv " command interprets question marks in file names as single position wild-card characters. " mv -i " inquires before overwriting any file. Thus, " ls -q " and " mv -i " become a convenient pair of commands.

    Here's the approach to cleaning up a problem file name:

    1. " cd " to the directory.
    2. Type " ls -l " to show the file name with control characters hidden. Call this " desired-name ." (You might want a different name, which is fine.)
    3. Type " ls -lq " to show it with question marks in place of control characters. Call this " wild-card-name "
    4. To rename the file, type: mv -i  wild-card-name    desired-name
  2. Using find and inode number:

    This is the definitive method, guaranteed to work. It uses the find command which is powerful (i.e., "dangerous"), and unfamiliar to many. Thus, it's our second choice. Be careful, an accident with " find ... -exec ... " can quickly delete all your files.

    Here's the approach:

    1. " cd " to the directory.
    2. Type " ls -l " to show the file name with control characters hidden. Call this " desired-name "  (You might want a different name, which is fine.)
    3. Type " ls -il " to show it again, this time, with the file's inode number <INUM> . (Every file on the system has its own, unique inode number.)
    4. Use the following find command, from within the directory in question, to rename the file:
      
       find * \( -type d -prune \) -o -inum <INUM> -exec mv -i {} 
      desired-name
       \;
      

Example Cleanup Session


  chilkoot% cd /tmp/marybob

  chilkoot% ls -l
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 10:31 
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 zzz
  -rw-------   1 marybob wrkgroup       0 Sep 21filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 Directory

  chilkoot% ls -lq
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 10:31 ??????
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 aaa???zzz
  -rw-------   1 marybob wrkgroup       0 Sep 21 11:00 ???????ba??filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 xxx???Directory

  chilkoot% mv -i xxx???Directory Directory

  chilkoot% mv -i ???????ba??filename filename

  chilkoot% mv -i aaa???zzz zzz

  chilkoot% rm -i ??????
  cmd-3187 rm: r? yve 

  chilkoot% ls -l
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 11:00 filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 Directory
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 zzz

Other problem characters in file names

Quoting file names containing graphic characters, like " : ", " ; ", " | ", and " ", helps commands resolve them. For instance, The following command changes a file name containing spaces:


  chilkoot% mv -i "from a mac" from_a_mac

A leading "-" in a file name often appears to "rm" or "mv" as a command option. Prepending an explicit path to the file name removes this confusion. For example, the following command removes a file named "-i" from the present working directory ("./"), inquiring first:


  chilkoot% rm -i ./-i 

One last trick: od

It can be baffling when white-space occurs at the end of a filename. Even " ls -b ", for example, won't show the file name "space at end " :


chilkoot% ls -lq
total 0
-rw-------    1 marybob   wrkgroup      0 Jun 19 17:10 space at end 
chilkoot%  mv "space at end" newname
space at end - No such file or directory

The "octal dump" or " od " command can show you the exact octal codes comprising your file names, without without interpretation. Pipe the output of ls into od -b :


chilkoot% ls 
 od -b
0000000 163 160 141 143 145 040 141 164 040 145 156 144 040 012
0000016

The last code, 012 is the new-line (added by ls ). The second to last code, 040 is the space. od -c interprets the codes, and also shows the trailing space:


chilkoot% ls 
 od -c
0000000   s   p   a   c   e       a   t       e   n   d      \n
0000016

Renaming this file, now that the erstwhile hidden space has been revealed, is simple:


chilkoot% mv -i "space at end "  nospace_at_end

Preventing similar problems in the future

  • If you're using csh , and the file names contain backspaces, marked   ^H , you probably need to define "erase". You could add the following to your .login file:
    
            #
            # Sets erase character to ^H for terminal sessions,
            # doesn't do so for NQS session.
            #
            if (! \${?ENVIRONMENT}) then
              stty erase '^H'
            else
              if (BATCH != \$ENVIRONMENT) stty erase '^H'
            endif
            #
    
    r, type this every time you log on:
    
           stty erase '^H'
    
  • You're using modem software, and need to set the "BACKSPACE" option.
  • Your TERM environment variable may be (or may have been) set incorrectly.

More Information

Back to Top