ASCII: Difference between revisions

Content deleted Content added
→‎Delete vs backspace: That's going to need a CN, note that the response to printing a DEL character has nothing to do with what the shell does when the user hits a key marked "del". I'm fairly certain ALL emulators treat it as a no-op.
SdkbBot (talk | contribs)
m Removed erroneous space and general fixes (task 1)
Line 11:
| mime = us-ascii
| image = USASCII code chart.png
| caption = ASCII chart from [[MIL-STD-188#MIL-STD-188-100_series100 series|MIL-STD-188-100]] (1972)
| lang = [[English language|English]] (made for; does not support all loanwords), [[Malay language|Malay]], [[Rotokas alphabet|Rotokas]], [[Interlingua]], [[Ido]], and [[X-SAMPA]] <!-- not Latin, see [[Apex (diacritic)]] and [[Interpunct]] -->
| extensions = * [[Unicode]]
Line 25:
'''ASCII''' ({{IPAc-en|audio=En-us-ASCII.ogg|ˈ|æ|s|k|iː}} {{respell|ASS|kee}}),<ref name="Mackenzie_1980">{{cite book |url=https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf |title=Coded Character Sets, History and Development |series=The Systems Programming Series |author-last=Mackenzie |author-first=Charles E. |date=1980 |edition=1 |publisher=[[Addison-Wesley Publishing Company, Inc.]] |isbn=978-0-201-14460-4 |lccn=77-90165 |pages=6, 66, 211, 215, 217, 220, 223, 228, 236–238, 243–245, 247–253, 423, 425–428, 435–439 |access-date=2019-08-25 |archive-url=https://web.archive.org/web/20160526172151/https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf |archive-date=May 26, 2016 |url-status=live |df=mdy-all }}</ref>{{rp|6}} abbreviated from '''American Standard Code for Information Interchange''', is a [[character encoding]] standard for electronic communication. ASCII codes represent text in computers, [[telecommunications equipment]], and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 [[code point]]s, of which only 95 are {{Pslink|printable characters}}, which severely limited its scope. Modern computer systems have evolved to use [[Unicode]], which has millions of code points, but the first 128 of these are the same as the ASCII set.
 
The [[Internet Assigned Numbers Authority]] (IANA) prefers the name '''US-ASCII''' for this character encoding.<ref name="IANA_2007">{{cite web|website=Internet Assigned Numbers Authority (IANA)|date=May 14, 2007|url=https://www.iana.org/assignments/character-sets|title=Character Sets|access-date=2019-08-25}}</ref>
 
ASCII is one of the IEEE milestones.
Line 80:
The code itself was patterned so that most control codes were together and all graphic codes were together, for ease of identification. The first two so-called ''ASCII sticks''{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> (32 positions) were reserved for control characters.<ref name="Mackenzie_1980"/>{{rp|220, 236 8,9)}} The [[Space (punctuation)|"space" character]] had to come before graphics to make [[sorting algorithm|sorting]] easier, so it became position 20<sub>[[hexadecimal|hex]]</sub>;<ref name="Mackenzie_1980"/>{{rp|237 §10}} for the same reason, many special signs commonly used as separators were placed before digits. The committee decided it was important to support uppercase [[sixbit code pages|64-character alphabets]], and chose to pattern ASCII so it could be reduced easily to a usable 64-character set of graphic codes,<ref name="Mackenzie_1980"/>{{rp|228, 237 §14}} as was done in the [[DEC SIXBIT]] code (1963). [[Lower case|Lowercase]] letters were therefore not interleaved with [[uppercase]]. To keep options available for lowercase letters and other graphics, the special and numeric codes were arranged before the letters, and the letter ''A'' was placed in position 41<sub>[[hexadecimal|hex]]</sub> to match the draft of the corresponding British standard.<ref name="Mackenzie_1980"/>{{rp|238 §18}} The digits 0–9 are prefixed with 011, but the remaining [[Nibble|4 bits]] correspond to their respective values in binary, making conversion with [[binary-coded decimal]] straightforward (for example, 5 in encoded to 011''0101'', where 5 is ''0101'' in binary).
 
Many of the non-alphanumeric characters were positioned to correspond to their shifted position on typewriters; an important subtlety is that these were based on ''mechanical'' typewriters, not ''electric'' typewriters.<ref name="Savard">{{cite web |title=Computer Keyboards |url=http://www.quadibloc.com/comp/kybint.htm |author-first=John J. G. |author-last=Savard |access-date=2014-08-24 |archive-url=https://web.archive.org/web/20140924183236/http://www.quadibloc.com/comp/kybint.htm |archive-date=September 24, 2014 |url-status=live }}</ref> Mechanical typewriters followed the [[de facto standard|''de facto'' standard]] set by the [[Remington No. 2]] (1878), the first typewriter with a shift key, and the shifted values of <code>23456789-</code> were <code>"#$%_&'()</code>{{snd}} early typewriters omitted ''0'' and ''1'', using ''O'' (capital letter ''o'') and ''l'' (lowercase letter ''L'') instead, but <code>1!</code> and <code>0)</code> pairs became standard once 0 and 1 became common. Thus, in ASCII <code>!"#$%</code> were placed in the second stick,{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> positions 1–5, corresponding to the digits 1–5 in the adjacent stick.{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> The parentheses could not correspond to ''9'' and ''0'', however, because the place corresponding to ''0'' was taken by the space character. This was accommodated by removing <code>_</code> (underscore) from ''6'' and shifting the remaining characters, which corresponded to many European typewriters that placed the parentheses with ''8'' and ''9''. This discrepancy from typewriters led to [[bit-paired keyboard]]s, notably the [[Teletype Model 33]], which used the left-shifted layout corresponding to ASCII, differently from traditional mechanical typewriters.
 
Electric typewriters, notably the [[IBM Selectric]] (1961), used a somewhat different layout that has become ''de facto'' standard on computers{{snd}} following the [[IBM PC]] (1981), especially [[Model M]] (1984){{snd}} and thus shift values for symbols on modern keyboards do not correspond as closely to the ASCII table as earlier keyboards did. The <code>/?</code> pair also dates to the No. 2, and the <code>,&lt; .&gt;</code> pairs were used on some keyboards (others, including the No. 2, did not shift <code>,</code> (comma) or <code>.</code> (full stop) so they could be used in uppercase without unshifting). However, ASCII split the <code>;:</code> pair (dating to No. 2), and rearranged mathematical symbols (varied conventions, commonly <code>-* =+</code>) to <code>:* ;+ -=</code>.
Line 262:
Probably the most influential single device affecting the interpretation of these characters was the [[Teletype Model 33]] ASR, which was a printing terminal with an available [[punched tape|paper tape]] reader/punch option. Paper tape was a very popular medium for long-term program storage until the 1980s, less costly and in some ways less fragile than magnetic tape. In particular, the Teletype Model 33 machine assignments for codes 17 (control-Q, DC1, also known as XON), 19 (control-S, DC3, also known as XOFF), and 127 ([[Delete key|delete]]) became ''de facto'' standards. The Model 33 was also notable for taking the description of control-G (code 7, BEL, meaning audibly alert the operator) literally, as the unit contained an actual bell which it rang when it received a BEL character. Because the keytop for the O key also showed a left-arrow symbol (from ASCII-1963, which had this character instead of [[underscore]]), a noncompliant use of code 15 (control-O, shift in) interpreted as "delete previous character" was also adopted by many early timesharing systems but eventually became neglected.
 
When a Teletype 33 ASR equipped with the automatic paper tape reader received a control-S (XOFF, an abbreviation for transmit off), it caused the tape reader to stop; receiving control-Q (XON, transmit on) caused the tape reader to resume. This so-called [[Flow control (data)|flow control]] technique became adopted by several early computer operating systems as a "handshaking" signal warning a sender to stop transmission because of impending [[buffer overflow]]; it persists to this day in many systems as a manual output control technique. On some systems, control-S retains its meaning, but control-Q is replaced by a second control-S to resume output.
 
The 33 ASR also could be configured to employ control-R (DC2) and control-T (DC4) to start and stop the tape punch; on some units equipped with this function, the corresponding control character lettering on the keycap above the letter was TAPE and <s>TAPE</s> respectively.<ref name="McConnell">{{cite web |title=Understanding ASCII Codes |author-last1=McConnell |author-first1=Robert |author-last2=Haynes |author-first2=James |author-last3=Warren |author-first3=Richard |url=http://www.nadcomm.com/ascii_code.htm |access-date=2014-05-11 |archive-url=https://web.archive.org/web/20140227190425/http://www.nadcomm.com/ascii_code.htm |archive-date=February 27, 2014 |url-status=dead}}</ref>
 
====Delete vs backspace====
The Teletype could not move its typehead backwards, so it did not have a key on its keyboard to send a BS (backspace). Instead, there was a key marked {{keypress|RUB OUT}} that sent code 127 (DEL). The purpose of this key was to erase mistakes in a manually-input paper tape: the operator had to push a button on the tape punch to back it up, then type the rubout, which punched all holes and replaced the mistake with a character that was intended to be ignored.<ref>{{cite mailing list |url=http://lists.gnu.org/archive/html/help-gnu-emacs/2014-05/msg00448.html |title=Re: editor and word processor history (was: Re: RTF for emacs) |author=Barry Margolin |mailing-list=help-gnu-emacs |date=May 29, 2014 |access-date=July 11, 2014 |archive-url=https://web.archive.org/web/20140714133149/http://lists.gnu.org/archive/html/help-gnu-emacs/2014-05/msg00448.html |archive-date=July 14, 2014 |url-status=live }}</ref> Teletypes were commonly used with the less-expensive computers from [[Digital Equipment Corporation]] (DEC); these systems had to use what keys were available, and thus the DEL character was assigned to erase the previous character.<ref name="pdp-6-monitor-manual">{{cite web |url=http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |title=PDP-6 Multiprogramming System Manual |page=43 |publisher=[[Digital Equipment Corporation]] (DEC) |date=1965 |access-date=July 10, 2014 |archive-url=https://web.archive.org/web/20140714140253/http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |archive-date=July 14, 2014 |url-status=live }}</ref><ref name="pdp-10-monitor-manual">{{cite web |url=http://bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |title=PDP-10 Reference Handbook, Book 3, Communicating with the Monitor |at=p. 5-5 |publisher=[[Digital Equipment Corporation]] (DEC) |date=1969 |access-date=July 10, 2014 |archive-url=https://web.archive.org/web/20111115083418/http://www.bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |archive-date=November 15, 2011 |url-status=live }}</ref> Because of this, DEC video terminals (by default) sent the DEL character for the key marked "Backspace" while the separate key marked "Delete" sent an [[escape sequence]]; many other competing terminals sent a BS character for the backspace key.
 
The early Unix tty drivers, unlike some modern implementations, allowed only one character to be set to erase the previous character in canonical input processing (where a very simple line editor is available); this could be set to BS ''or'' DEL, but not both, resulting in recurring situations of ambiguity where users had to decide depending on what terminal they were using ([[Shell (computing)|shells]] that allow line editing, such as [[KornShell|ksh]], [[Bash (Unix shell)|bash]], and [[Z shell|zsh]], understand both). The assumption that no key sent a BS character allowed Ctrl+H to be used for other purposes, such as the "help" prefix command in [[GNU Emacs]].<ref>{{cite web|url=https://www.gnu.org/software/emacs/manual/html_node/emacs/Help.html|title=Help - GNU Emacs Manual|access-date=July 11, 2018|archive-url=https://web.archive.org/web/20180711223750/https://www.gnu.org/software/emacs/manual/html_node/emacs/Help.html|archive-date=July 11, 2018|url-status=live}}</ref>
 
====Escape====
Many more of the control characters have been assigned meanings quite different from their original ones. The "escape" character (ESC, code 27), for example, was intended originally to allow sending of other control characters as literals instead of invoking their meaning, an "escape sequence". This is the same meaning of "escape" encountered in URL encodings, [[C (programming language)|C language]] strings, and other systems where certain characters have a reserved meaning. Over time this interpretation has been co-opted and has eventually been changed.
 
In modern usage, an ESC sent ''to'' the terminal usually indicates the start of a command sequence, which can be used to address the cursor, scroll a region, set/query various terminal properties, and more. They are usually in the form of a so-called "[[ANSI escape code]]" (often starting with a "[[Control Sequence Introducer]]", "CSI", "{{Mono|ESC [}}") from ECMA-48 (1972) and its successors. Some escape sequences do not have introducers, like the [[VT100]] full reset command "{{Mono|ESC c}}". <ref>{{cite web|url=https://invisible-island.net/xterm/ctlseqs/ctlseqs.html|title=XTerm Control Sequences|access-date=January 17, 2024}}</ref>
 
In contrast, an ESC read ''from'' the terminal is most often used as an [[out-of-band data|out-of-band]] character used to terminate an operation or special mode, as in the [[Text Editor and Corrector|TECO]] and [[Vi (text editor)|vi]] [[text editor]]s. In [[graphical user interface]] (GUI) and [[window (computing)|windowing]] systems, ESC generally causes an application to abort its current operation or to [[exit (system call)|exit]] (terminate) altogether.
Line 281:
The inherent ambiguity of many control characters, combined with their historical usage, created problems when transferring "plain text" files between systems. The best example of this is the [[newline]] problem on various [[operating system]]s. Teletype machines required that a line of text be terminated with both "carriage return" (which moves the printhead to the beginning of the line) and "line feed" (which advances the paper one line without moving the printhead). The name "carriage return" comes from the fact that on a manual [[typewriter]] the carriage holding the paper moves while the typebars that strike the ribbon remain stationary. The entire carriage had to be pushed (returned) to the right in order to position the paper for the next line.
 
DEC operating systems ([[OS/8]], [[RT-11]], [[RSX-11]], [[RSTS/E|RSTS]], [[TOPS-10]], etc.) used both characters to mark the end of a line so that the console device (originally Teletype machines) would work. By the time so-called "glass TTYs" (later called CRTs or "dumb terminals") came along, the convention was so well established that [[backward compatibility]] necessitated continuing to follow it. When [[Gary Kildall]] created [[CP/M]], he was inspired by some of the command line interface conventions used in DEC's RT-11 operating system.
 
Until the introduction of PC DOS in 1981, [[IBM]] had no influence in this because their 1970s operating systems used EBCDIC encoding instead of ASCII, and they were oriented toward punch-card input and line printer output on which the concept of "carriage return" was meaningless. IBM's PC DOS (also marketed as [[MS-DOS]] by Microsoft) inherited the convention by virtue of being loosely based on CP/M,<ref>{{cite web|url=http://dosmandrivel.blogspot.com/2007/08/is-dos-rip-off-of-cpm.html|title=Is DOS a Rip-Off of CP/M?|author=Tim Paterson|date=August 8, 2007|website=DosMan Drivel|author-link=Tim Paterson|access-date=April 19, 2018|archive-url=https://web.archive.org/web/20180420075137/http://dosmandrivel.blogspot.com/2007/08/is-dos-rip-off-of-cpm.html|archive-date=April 20, 2018|url-status=live}}</ref> and [[Windows]] in turn inherited it from MS-DOS.
Line 680:
{{cols|colwidth=30em}}
* [[3568 ASCII]] – an asteroid named after the character encoding
* {{anliannotated link|Alt codes}}
* {{anliannotated link|ASCII 8}}
* {{anliannotated link|ASCII art}}
* {{anliannotated link|ASCII Ribbon Campaign}}
* [[Basic Latin (Unicode block)]] – ASCII as a subset of Unicode
* {{anliannotated link|Extended ASCII}}
* [[HTML decimal character rendering]]
* [[Jargon File]] – a glossary of computer programmer slang which includes a list of common slang names for ASCII characters
Line 730:
==========(PLEASE NOTE)============ -->
* {{cite web |title=C0 Controls and Basic Latin – Range: 0000–007F |work=The Unicode Standard 8.0 |date=2015 |orig-year=1991 |publisher=[[Unicode, Inc.]] |url=https://www.unicode.org/charts/PDF/U0000.pdf |access-date=2016-05-26 |url-status=live |archive-url=https://web.archive.org/web/20160526182105/http://www.unicode.org/charts/PDF/U0000.pdf |archive-date=2016-05-26}}
 
 
{{Character encodings|state=collapsed}}
{{Authority control}}
 
[[Category:ASCII| ]]
[[Category:Computer-related introductions in 1963]]