ASCII: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Revision as of 15:29, 11 May 2024 edit JMF (talk \| contribs) Extended confirmed users 54,258 edits Undid revision 1223330621 by Red9828 (talk) RV good faith but no. ASCII was only 7 bits. Storage was a different issue. Systems with 36 but words stored five characters per word. Some systems had a parity bit. Tags: Undo Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 02:11, 11 July 2024 edit undo 135.23.116.37 (talk) Unicode does not have millions of code points anymore. It is limited to 17 planes of 65536 characters each, about 1.1 million. Next edit →
(17 intermediate revisions by 13 users not shown)
Line 2: {{hatnote group\| {{other uses}} {{Distinguish\|text=MS [[Windows-1252]] or other types of [[~~Extended~~extended ASCII]]}} }} {{Use mdy dates\|date=June 2013\|cs1-dates=y}} Line 23: \| classification = [[ISO/IEC 646\|ISO/IEC 646 series]] }} '''ASCII''' ({{IPAc-en\|audio=En-us-ASCII.ogg\|ˈ\|æ\|s\|k\|iː}} {{respell\|ASS\|kee}}),<ref name="Mackenzie_1980">{{cite book \|url=https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf \|title=Coded Character Sets, History and Development \|series=The Systems Programming Series \|author-last=Mackenzie \|author-first=Charles E. \|date=1980 \|edition=1 \|publisher=[[Addison-Wesley Publishing Company, Inc.]] \|isbn=978-0-201-14460-4 \|lccn=77-90165 \|pages=6, 66, 211, 215, 217, 220, 223, 228, 236–238, 243–245, 247–253, 423, 425–428, 435–439 \|access-date=2019-08-25 \|archive-url=https://web.archive.org/web/20160526172151/https://textfiles.meulie.net/bitsaved/Books/Mackenzie_CodedCharSets.pdf \|archive-date=May 26, 2016 \|url-status=live \|df=mdy-all }}</ref>{{rp\|6}} an acronym for '''American Standard Code for Information Interchange''', is a [[character encoding]] standard for electronic communication. ASCII codes represent text in computers, [[telecommunications equipment]], and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 [[code point]]s, of which only 95 are {{Pslink\|printable characters}}, which severely limited its scope. Modern computer systems have evolved to use [[Unicode]], which has ~~millions~~over ofa million code points, but the first 128 of these are the same as the ASCII set. The [[Internet Assigned Numbers Authority]] (IANA) prefers the name '''US-ASCII''' for this character encoding.<ref name="IANA_2007">{{cite web\|website=Internet Assigned Numbers Authority (IANA)\|date=May 14, 2007\|url=https://www.iana.org/assignments/character-sets\|title=Character Sets\|access-date=2019-08-25}}</ref> Line 40: Despite being an American standard, ASCII does not have a code point for the [[Cent (currency)\|cent]] (¢). It also does not support [[English terms with diacritical marks]] such as [[résumé]] and [[jalapeño]], or [[proper nouns]] with diacritical marks such as [[Beyoncé]]. ==<span class="anchor" id="1963"></span><span class="anchor" id="1965"></span><span class="anchor" id="1967"></span><span class="anchor" id="1968"></span><span class="anchor" id="1977"></span><span class="anchor" id="1986"></span><span class="anchor" id="1992"></span><span class="anchor" id="1997"></span><span class="anchor" id="2002"></span><span class="anchor" id="2007"></span><span class="anchor" id="2012"></span><span class="anchor" id="2017"></span><span class="anchor" id="2022"></span>History== [[File:ASCII1963-infobox-paths.svg\|thumb\|upright=1.25\|right\|ASCII (1963). [[Control Pictures]] of equivalent controls are shown where they exist, or a grey dot otherwise.]] The American Standard Code for Information Interchange (ASCII) was developed under the auspices of a committee of the American Standards Association (ASA), called the X3 committee, by its X3.2 (later X3L2) subcommittee, and later by that subcommittee's X3.2.4 working group (now [[INCITS]]). The ASA later became the United States of America Standards Institute (USASI)<ref name="Mackenzie_1980"/>{{rp\|211}} and ultimately became the [[American National Standards Institute]] (ANSI). Line 48: The X3 committee made other changes, including other new characters (the [[brace (punctuation)\|brace]] and [[vertical bar]] characters),<ref>Report of Meeting No. 8, Task Group X3.2.4, December 17 and 18, 1963</ref> renaming some control characters (SOM became start of header (SOH)) and moving or removing others (RU was removed).<ref name="Mackenzie_1980"/>{{rp\|247–248}} ASCII was subsequently updated as USAS X3.4-1967,<ref name="ASCII-1967"/><ref name="Winter_2010">{{cite web \|title=US and International standards: ASCII \|url=http://homepages.cwi.nl/~dik/english/codes/stand.html#ascii \|author-first=Dik T. \|author=Winter \|date=2010 \|orig-year=2003 \|url-status=dead \|archive-url=https://web.archive.org/web/20100116001012/http://homepages.cwi.nl/~dik/english/codes/stand.html#ascii \|archive-date=2010-01-16}}</ref> then USAS X3.4-1968,<ref name="ASCII-1968">{{cite tech report \|title=USA Standard Code for Information Interchange, USAS X3.4-1968 \|url=https://archive.org/details/enf-ascii-1968-1970/ \|publisher=[[United States of America Standards Institute]] \|date=October 10, 1968}}</ref> ANSI X3.4-1977, and finally, ANSI X3.4-1986.<ref name="ASCII-1986"/><ref name="Salste_2016">{{cite web \|title=7-bit character sets: Revisions of ASCII \|author-first=Tuomas \|author-last=Salste \|publisher=Aivosto Oy \|date=January 2016 \|id={{URN\|nbn\|fi-fe201201011004}} \|url=http://www.aivosto.com/vbtips/charsets-7bit.html#body \|access-date=2016-06-13 \|url-status=live \|archive-url=https://web.archive.org/web/20160613145224/http://www.aivosto.com/vbtips/charsets-7bit.html#body \|archive-date=2016-06-13}}</ref> ===Revisions ~~of the ASCII standard:~~=== * ASA ~~<!-- Standard -->~~X3.4-1963<ref name="Mackenzie_1980"/><ref name="ASCII-1963"/><ref name="Winter_2010"/><ref name="Salste_2016"/> * ASA X3.4-1965 (approved, but not published, nevertheless used by [[IBM 2260]] & [[IBM 2265\|2265]] Display Stations and [[IBM 2848]] Display Control)<ref name="Mackenzie_1980"/>{{rp\|423, 425–428, 435–439}}<ref name="SA_215">{{cite journal \|title=Information<!-- Title of issue, not title of article --> \|date=September 1966 \|volume=215 \|number=3 \|type=special edition \|journal=[[Scientific American]] \|jstor=e24931041 }}</ref><ref name="Winter_2010"/><ref name="Salste_2016"/> * USAS ~~<!-- USA Standard -->~~X3.4-1967<ref name="Mackenzie_1980"/><ref name="ASCII-1967"/><ref name="Salste_2016"/> * USAS X3.4-1968~~<!-- October 1968 -->~~<ref name="Mackenzie_1980"/><ref name="ASCII-1968"/><ref name="Salste_2016"/> * ANSI ~~<!-- American National Standard -->~~X3.4-1977<ref name="Salste_2016"/> * ANSI X3.4-1986<ref name="ASCII-1986"/><ref name="Salste_2016"/> * ANSI X3.4-1986 (R1992) * ANSI X3.4-1986 (R1997) * ANSI INCITS 4-1986 (R2002)<ref name="Korpela_2014">{{cite book \|title=Unicode Explained – Internationalize Documents, Programs, and Web Sites \|author-first=Jukka K. \|author-last=Korpela \|edition=2nd release of 1st \|date=2014-03-14 \|orig-year=2006-06-07 \|publisher=[[O'Reilly Media, Inc.]] \|isbn=978-0-596-10121-3 \|page=118}}</ref> * ANSI INCITS 4-1986 (R2007)~~<!-- official name with space and with (round brackets) -->~~<ref name="ANSI_INCITS_4-1986_2007">{{citation \|title=ANSI INCITS 4-1986 (R2007): American National Standard for Information Systems – Coded Character Sets – 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII) \|date=2007 \|orig-year=1986 }}</ref> * ~~(ANSI)<!-- not sure if "ANSI" is still part of the standard's official name, hence put in brackets for now -->~~ INCITS 4-1986[ (R2012~~]<!-- official name without space and with [square brackets] -->~~)<ref name="INCITS_4-1986_R2012"/> * ~~(ANSI)~~ INCITS 4-1986[ (R2017~~]<!-- official name without space and with [square brackets] -->~~)<ref name="INCITS_4-1986_R2017"/> * INCITS 4-1986 (R2022)<ref>https://webstore.ansi.org/standards/incits/incits1986r2022</ref> In the X3.15 standard, the X3 committee also addressed how ASCII should be transmitted ([[least significant bit]] first)<ref name="Mackenzie_1980"/>{{rp\|249–253}}<ref name="X3.15-1966">{{citation \|title=Bit Sequencing of the American National Standard Code for Information Interchange in Serial-by-Bit Data Transmission \|id=X3.15-1966 \|date=1966 \|publisher=[[American National Standards Institute]] (ANSI)}}</ref> and recorded on perforated tape. They proposed a [[9-track]] standard for magnetic tape and attempted to deal with some [[punched card]] formats. Line 69 ⟶ 70: ===Bit width=== The X3.2 subcommittee designed ASCII based on the earlier teleprinter encoding systems. Like other [[character encoding]]s, ASCII specifies a correspondence between digital bit patterns and [[character (computing)\|character]] symbols (i.e. [[grapheme]]s and [[control character]]s). This allows [[Digital data\|digital]] devices to communicate with each other and to process, store, and communicate character-oriented information such as written language. Before ASCII was developed, the encodings in use included 26 [[English alphabet\|alphabetic]] characters, 10 [[numerical digit]]s, and from 11 to 25 special graphic symbols. To include all these, and control characters compatible with the [[CCITT\|Comité Consultatif International Téléphonique et Télégraphique]] (CCITT) [[International Telegraph Alphabet No. 2]] (ITA2) standard of ~~1924~~1932,<ref ~~name="Bruxy_2005"~~>{{~~Cite~~cite web ~~\|date=2005-10-10 \|title=BruXy: Radio Teletype communication~~ \|url=http://~~bruxy~~handle.~~regnet~~itu.czint/~~web~~11.1004/~~hamradio/EN/radio-teletype-communication~~020.1000/4.5.43.en.101 \|~~url~~title=Telegraph Regulations and Final Protocol (Madrid, 1932) \|access-~~status~~date=~~live~~9 Jun 2024 \|archive-url=https://web.archive.org/web/~~20160412130035~~20230821020920/~~http~~https://~~bruxy~~search.~~regnet~~itu.czint/~~web~~history/~~hamradio/EN/radio-teletype-communication~~HistoryDigitalCollectionDocLibrary/4.5.43.en.101.pdf \|archive-date=~~April~~21 ~~12,~~August ~~2016 \|access-date=2016-05-09 \|quote=The transmitted code use International Telegraph Alphabet No. 2 (ITA-2) which was introduced by CCITT in 1924.~~2023}}</ref><ref name="bdcode">{{cite web \|author-last=Smith \|author-first=Gil \|title=Teletype Communication Codes \|publisher=Baudot.net \|date=2001 \|url=http://www.baudot.net/docs/smith--teletype-codes.pdf \|access-date=2008-07-11 \|archive-url=https://web.archive.org/web/20080820043949/http://www.baudot.net/docs/smith--teletype-codes.pdf \|archive-date=August 20, 2008 \|url-status=live }}</ref> [[FIELDATA]] (1956{{citation needed\|date=June 2016\|reason=My sources state 1957 rather than 1956, but Wikipedia states 1956 in various places. This needs to be sorted out with better sources.}}), and early [[EBCDIC]] (1963), more than 64 codes were required for ASCII. ITA2 was in turn based on [[Baudot code]], the 5-bit telegraph code Émile Baudot invented in 1870 and patented in 1874.<ref name="bdcode" /> Line 97 ⟶ 98: ==<span class="anchor" id="Code chart"></span><span class="anchor" id="ASCII printable code chart"></span><span class="anchor" id="ASCII printable characters"></span>Character set== [[File:ASCII Table (suitable for printing).svg\|thumb]] {\|{{chset-table-header1\|ASCII (1977/1986)}} \|- Line 253 ⟶ 255: ===<span class="anchor" id="ASCII control characters"></span>Control characters=== [[File:US ASCII Control Character Symbols.png\|thumb\|right\|Early symbols assigned to the 32 control characters, space and delete characters. ([[ISO 2047]], MIL-STD-188-100, 1972)]] {{Main\|Control ~~characters~~character}} ASCII reserves the first 32 [[code point]]s (numbers 0–31 decimal) and the last one (number 127 decimal) for [[control character]]s. These are codes intended to control [[peripheral device]]s (such as [[computer printer\|printers]]), or to provide [[Metadata\|meta-information]] about data streams, such as those stored on magnetic tape. Despite their name, these code points do not represent printable characters (i.e. they are not characters at all, but signals). For debugging purposes, "placeholder" symbols (such as those given in [[ISO 2047]] and its predecessors) are assigned to them. Line 648 ⟶ 650: :<code>{ a[i] = '\n'; }</code> [[C trigraph]]s were created to solve this problem for [[ANSI C]], although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on ~~US-~~ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or [[Usenet]]) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm\|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches". In Japan and Korea, still {{As of\|2021\|alt=as of the 2020s\|post=,\|df=US}} a variation of ASCII is used, in which the [[backslash]] (5C hex) is rendered as ¥ (a [[Yen sign]], in Japan) or ₩ (a [[Won sign]], in Korea). This means that, for example, the file path C:\Users\Smith is shown as C:¥Users¥Smith (in Japan) or C:₩Users₩Smith (in Korea). Line 657 ⟶ 659: {{Main\|Extended ASCII}}{{See also\|ISO/IEC 8859\|UTF-8}} <!-- to be mentioned [[USASCII-8]] --> Eventually, as 8-, [[16-bit computing\|16-]], and [[32-bit computing\|32-bit]] (and later [[64-bit computing\|64-bit]]) computers began to replace [[12-bit computing\|12-]], [[18-bit computing\|18-]], and [[36-bit computing\|36-bit]] computers as the norm, it became common to use an 8-bit byte to store each character in memory, providing an opportunity for extended, 8-bit relatives of ASCII. In most cases these developed as true extensions of ASCII, leaving the original character-mapping intact, but adding additional character definitions after the first 128 (i.e., 7-bit) characters. ASCII itself remained a seven-bit code: the term "extended ASCII" has no official status. For some countries, 8-bit extensions of ASCII were developed that included support for characters used in local languages; for example, [[ISCII]] for India and [[VISCII]] for Vietnam. [[Kaypro]] [[CP/M]] computers used the "upper" 128 characters for the Greek alphabet.{{citation needed\|date=November 2023}} Line 667 ⟶ 669: IBM defined [[code page 437]] for the [[IBM PC]], replacing the control characters with graphic symbols such as [[Emoticon\|smiley faces]], and mapping additional graphic characters to the upper 128 positions.<ref>{{cite book \|url=http://www.bitsavers.org/pdf/ibm/pc/pc/6025008_PC_Technical_Reference_Aug81.pdf \|title=Technical Reference \|at=Appendix C. Of Characters Keystrokes and Color \|edition=First \|date=August 1981 \|series=Personal Computer Hardware Reference Library \|publisher=IBM}}</ref> [[Digital Equipment Corporation]] developed the [[Multinational Character Set]] (DEC-MCS) for use in the popular [[VT220]] [[computer terminal\|terminal]] as one of the first extensions designed more for international languages than for block graphics. [[Apple Inc.\|Apple]] defined [[Mac OS Roman]] for the Macintosh and [[Adobe Inc.\|Adobe]] defined the [[PostScript Standard Encoding]] for [[PostScript]]; both sets contained "international" letters, typographic symbols and punctuation marks instead of graphics, more like modern character sets. The [[ISO/IEC 8859]] standard (derived from the DEC-MCS) provided a standard that most systems copied (or at least were based on, when not copied exactly). A popular further extension designed by Microsoft, [[Windows-1252]] (often mislabeled as [[ISO-8859-1]]), added the typographic punctuation marks needed for traditional text printing. ISO-8859-1, Windows-1252, and the original 7-bit ASCII were the most common character ~~encodings~~encoding methods on the [[World Wide Web]] until 2008, when [[UTF-8]] overtook them.<ref name="UTF-8_2008"/> [[ISO/IEC 4873]] introduced 32 additional control codes defined in the 80–9F [[hexadecimal]] range, as part of extending the 7-bit ASCII encoding to become an 8-bit system.<ref name="Unicode-5.0_2006">{{cite book \|author=The Unicode Consortium \|editor-first=Julie D. \|editor-last=Allen \|title=The Unicode standard, Version 5.0 \|date=2006-10-27 \|publisher=[[Addison-Wesley Professional]] \|location=Upper Saddle River, New Jersey, US \|isbn=978-0-321-48091-0 \|chapter-url=http://unicode.org/book/ch13.pdf \|archive-url=https://ghostarchive.org/archive/20221009/http://unicode.org/book/ch13.pdf \|archive-date=2022-10-09 \|url-status=live \|access-date=2015-03-13 \|chapter=Chapter 13: Special Areas and Format Characters \|page=314}}</ref>