[Menu Bar] Resourses at ARSC Science at ARSC Newsroom Support About ARSC ARSC Home

Removing Non-printing Characters from File Names

Topics:

Introduction

It is possible for file names, like files, to contain control and other non-printing characters in addition to the alpha-numeric characters, punctuation marks, and other "printing" symbols used in the english language.

As a rule, such charaters in file names are unanticipated and create confusion for users, applications, and even system software.

Simply removing them using the "normal" set of Unix commands can be a challenge. Hence, the motivation for this document.

What is a "non-printing" character?

The table, below, shows the ASCII character set: 3-digit octal numbers (codes) followed by their interpretation or the printable symbol they represent. For instance, the letter A is encoded by the octal number 101. Backspace, bs, is encoded by 010.

Octal codes 000-037 and 177 encode non-printing characters.

ASCII Character Set, (codes given in octal)
  |000 nul |001 soh |002 stx |003 etx |004 eot |005 enq |006 ack |007 bel |
  |010 bs  |011 ht  |012 nl  |013 vt  |014 np  |015 cr  |016 so  |017 si  |
  |020 dle |021 dc1 |022 dc2 |023 dc3 |024 dc4 |025 nak |026 syn |027 etb |
  |030 can |031 em  |032 sub |033 esc |034 fs  |035 gs  |036 rs  |037 us  |
  |040 sp  |041 !   |042 "   |043 #   |044 $   |045 %   |046 &   |047 '   |
  |050 (   |051 )   |052 *   |053 +   |054 ,   |055 -   |056 .   |057 /   |
  |060 0   |061 1   |062 2   |063 3   |064 4   |065 5   |066 6   |067 7   |
  |070 8   |071 9   |072 :   |073 ;   |074 <   |075 =   |076 >   |077 ?   |
  |100 @   |101 A   |102 B   |103 C   |104 D   |105 E   |106 F   |107 G   |
  |110 H   |111 I   |112 J   |113 K   |114 L   |115 M   |116 N   |117 O   |
  |120 P   |121 Q   |122 R   |123 S   |124 T   |125 U   |126 V   |127 W   |
  |130 X   |131 Y   |132 Z   |133 [   |134 \   |135 ]   |136 ^   |137 _   |
  |140 `   |141 a   |142 b   |143 c   |144 d   |145 e   |146 f   |147 g   |
  |150 h   |151 i   |152 j   |153 k   |154 l   |155 m   |156 n   |157 o   |
  |160 p   |161 q   |162 r   |163 s   |164 t   |165 u   |166 v   |167 w   |
  |170 x   |171 y   |172 z   |173 {   |174 |   |175 }   |176 ~   |177 del |

How did a "non-printing" character get into my file name?

The backspace character (CONTROL-H, ^H, or \010) is quite common. It is usually introduced into file names, by accident, when your terminal is misconfigured and you use the backspace during file naming operations.

For instance, the file name,

  aaa^H^H^Hzzz
was intended by the user to be
  zzz

It was created by typing "aaa," hitting the backspace 3 times in an attempt to delete the "aaa", and typing "zzz." (A Unix file copy command, such as, "cp a.out aaa^H^H^Hzzz" would do it.) Due to terminal misconfiguration, the code for the backspaces were included in the filename.

To make life interesting, the "ls" or "ls -l" command shows the file name as,

  zzz

because the three backspaces are interpreted, "backspacing" over the "aaa" on the screen, so that only "zzz" appears. (As described below, "ls -b" or "ls -q" will display, and not interpret, control characters.)

Other unusual characters, like ^B, ^M, and ^], make an occassional appearance in file names.

Locating "non-printing" characters in file names

The ls command has all the options you need to detect "hidden" characters:

Command Function Sample Output
ls hides/interprets nongraphic characters zzz
ls -b prints octal codes for nongraphic characters zzz\010\010\010aaa
ls -q prints question marks for nongraphic characters zzz???aaa

Extirpating non-printing characters

We recommend two approaches, both of which require care and caution. If you're unsure, please contact User Support for assistance.

  1. Using mv and wild-card characters:

    The "mv" command interprets question marks in file names as single position wild-card characters. "mv -i" inquires before overwriting any file. Thus, "ls -q" and "mv -i" become a convenient pair of commands.

    Here's the approach to cleaning up a problem file name:

    1. "cd" to the directory.
    2. Type "ls -l" to show the file name with control characters hidden. Call this "desired-name." (You might want a different name, which is fine.)
    3. Type "ls -lq" to show it with question marks in place of control characters. Call this "wild-card-name"
    4. To rename the file, type: mv -i wild-card-name  desired-name
  2. Using find and inode number:

    This is the definitive method, guaranteed to work. It uses the find command which is powerful (i.e., "dangerous"), and unfamiliar to many. Thus, it's our second choice. Be careful, an accident with "find ... -exec ... " can quickly delete all your files.

    Here's the approach:

    1. "cd" to the directory.
    2. Type "ls -l" to show the file name with control characters hidden. Call this "desired-name"  (You might want a different name, which is fine.)
    3. Type "ls -il" to show it again, this time, with the file's inode number <INUM>. (Every file on the system has its own, unique inode number.)
    4. Use the following find command, from within the directory in question, to rename the file:
       find * \( -type d -prune \) -o -inum <INUM> -exec mv -i {} desired-name \;
      

Example Cleanup Session

  chilkoot% cd /tmp/marybob

  chilkoot% ls -l
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 10:31 
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 zzz
  -rw-------   1 marybob wrkgroup       0 Sep 21filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 Directory

  chilkoot% ls -lq
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 10:31 ??????
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 aaa???zzz
  -rw-------   1 marybob wrkgroup       0 Sep 21 11:00 ???????ba??filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 xxx???Directory

  chilkoot% mv -i xxx???Directory Directory

  chilkoot% mv -i ???????ba??filename filename

  chilkoot% mv -i aaa???zzz zzz

  chilkoot% rm -i ??????
  cmd-3187 rm: r? yve 

  chilkoot% ls -l
  total 32
  -rw-------   1 marybob wrkgroup       0 Sep 21 11:00 filename
  drwx------   2 marybob wrkgroup    4096 Sep 21 09:43 Directory
  -rw-------   1 marybob wrkgroup       0 Sep 21 09:43 zzz

Other problem characters in file names

Quoting file names containing graphic characters, like ":", ";", "|", and " ", helps commands resolve them. For instance, The following command changes a file name containing spaces:

  chilkoot% mv -i "from a mac" from_a_mac

A leading "-" in a file name often appears to "rm" or "mv" as a command option. Prepending an explicit path to the file name removes this confusion. For example, the following command removes a file named "-i" from the present working directory ("./"), inquiring first:

  chilkoot% rm -i ./-i 

One last trick: od

It can be baffling when white-space occurs at the end of a filename. Even "ls -b", for example, won't show the file name "space at end ":

chilkoot% ls -lq
total 0
-rw-------    1 marybob   wrkgroup      0 Jun 19 17:10 space at end 
chilkoot%  mv "space at end" newname
space at end - No such file or directory

The "octal dump" or "od" command can show you the exact octal codes comprising your file names, without without interpretation. Pipe the output of ls into od -b:

chilkoot% ls | od -b
0000000 163 160 141 143 145 040 141 164 040 145 156 144 040 012
0000016

The last code, 012 is the new-line (added by ls). The second to last code, 040 is the space. od -c interprets the codes, and also shows the trailing space:

chilkoot% ls | od -c
0000000   s   p   a   c   e       a   t       e   n   d      \n
0000016

Renaming this file, now that the erstwhile hidden space has been revealed, is simple:

chilkoot% mv -i "space at end "  nospace_at_end

Preventing similar problems in the future

More Information

 

Arctic Region Supercomputing Center
PO Box 756020, Fairbanks, AK 99775 | voice: 907-450-8600 | email:

home | search | about | support | news | science | resources