Substitute character: Difference between revisions

Content deleted Content added
Remove recursive link, add <code> tag to "fg" Unix command
m ce deleted "for" in redirect "for the 'undo' function" b/c "For" auto added WP:FIX 'for for'
 
(39 intermediate revisions by 23 users not shown)
Line 1:
{{Short description|Non-printing computer data item}}
A '''substitute character''' (␚) is a [[control character]] that is used in the place of a character that is recognized to be invalid or erroneous, or that cannot be represented on a given device. It is also used as an escape sequence in some [[programming language]]s.
{{Redirect|␦|the Arabic question mark|؟|the rhetorical question or irony mark|⸮}}
{{Redirect|Ctrl+Z|the "undo" function|Undo}}
{{Use dmy dates|date=February 2020|cs1-dates=y}}
{{Use list-defined references|date=December 2021}}
AIn computer data, a '''substitute character''' (␚) is a [[control character]] that is used to pad transmitted data in theorder to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid or, erroneous, or that cannot be representedunrepresentable on a given device. It is also used as an escape sequence in some [[programming language]]s.
 
In the [[ASCII]] and [[Unicode]] [[character set]]s, this character is encoded by the number 26 (<tt>{{mono|1A</tt>}} [[hexadecimal|hex]]). Standard [[computer keyboard|keyboards]] transmit this code when the [[Controlcontrol key|{{keypress|Ctrl}}]] and {{keypress|[[Z]]}} keys are pressed simultaneously ({{mono|Ctrl+Z}}, often documented by convention often described as ''{{mono|^Z}})''.<ref name="Microsoft SupportMicrosoft_126449"/>{{cite web|title=Keyboard[[Unicode]] shortcutsinherits forthis Windows|url=http://support.microsoft.com/kb/126449|work=Microsoftcharacter Support|publisher=Microsoft|accessdate=2from JuneASCII, 2012}}</ref>but recommends that the [[replacement character]] (�, U+FFFD) be used instead to represent un-decodable inputs, when the output encoding is compatible with it.
 
==Uses==
===End of file===
{{Main|End-of-file}}
Historically, under [[PDP-6]] monitor,<ref name="DEC_1965_PDP-6"/> [[RT-11]], [[OpenVMS|VMS]], and [[TOPS-10]],<ref name="DEC_1969_PDP-10"/> and in early PC [[CP/M]] 1 and 2 [[operating system]]s (and derivatives like [[MP/M]])<!-- and possibly 86-DOS before 0.42, 0.56 or 1.00 as well -- needs testing --> it was necessary to explicitly mark the [[end -of -file|end of a file]] (EOF) because the CP/Mnative [[filesystem]] could not record the exact file size by itself; files were allocated in extents (records) of a fixed size, typically leaving some allocated but unused space at the end of each file.<ref name="Elliott_1998_CPM14"/>John<ref Elliott (1998). ''CPname="Elliott_1998_CPM22"/M 1.4 disc formats''. ([http://www.seasip.info/Cpm/format14.html])</ref><ref name="Elliott_1998_CPM22DRI_1979_CPM20-IG"/>John<ref Elliott (1998). ''CPname="Hogan_1982_CP/M 2.2 disc formats''. ([http:"//www.seasip.info/Cpm/format22.html])</ref> This extra space was filled with {{mono|1A}}<ttsub style="font-size:6px">1A16</ttsub> ([[hexadecimal|hex]]) characters under CP/M. The extended CP/M filesystems used by CP/M 3<!-- definitely 3.1 and 4.1, but probably since 3.0 --> and higher (and derivatives like [[Concurrent CP/M]], [[Concurrent DOS]], and [[DOS Plus]]) did support byte-granular files,<ref name="Elliott_1998_CPM31">John Elliott (1998). ''CP/M 3.1 disc formats''. ([http://www.seasip.info/Cpm/format31.html])</ref><ref name="Elliott_1998_DOSPLUS">John Elliott (1998). ''CP/M 4.1 disc formats''. ([http://www.seasip.info/Cpm/format41.html])</ref> so this was no longer a physical requirement, but ait remained mereas a convention (especially for [[text file]]s) in order to ensure backward compatibility.
 
In [[CP/M]], [[86-DOS]], [[MS-DOS]], [[PC &nbsp;DOS]], [[DR-DOS]], and their various derivatives, the SUB character was also used to indicate the end of a character stream,{{citation needed|date=October 2023}} and thereby used to terminate user input in an interactive [[command line]] window (and as such, often used to finish console input redirection, e.g. as instigated by <tt>the command {{code|COPY CON: TYPEDTXT.TXT</tt>|dosbatch}}).
 
While no longer technically required to indicate the end of a file, as of 2017, many text editors{{which|date=October 2023}} and program languages still<!--as of 2017--> support this convention, or can be configured to insert this character at the end of a file when editing, or at least properly cope with them in text files.{{citation needed|date=October 2021}} In such cases, it is often termed a "soft" EOF, as it does not necessarily represent the physical end of the file, but is more a marker indicating that "there is no useful data beyond this point". In reality, more data may exist beyond this character up to the actual end of the data in the file system, thus it can be used to hide file content when the file is entered at the console or opened in editors. Many file format standards (e.g. [[Portable Network Graphics|PNG]] or [[GIF]]) include the SUB character in their headers to perform precisely this function. Some modern text file formats (e.g. [[CSV-1203]]<ref>[http://www.mastpoint.com/csv-1203 CSVname="Mastpoint_2016_CSV-1203 format specification]<"/ref>) still recommend a trailing EOF character to be appended as the last character in the file. However, typing {{key presskeypress|Control|Z}} does not embed an EOF character into a file in either [[MS-DOS]] or [[Microsoft Windows|Windows]], nor do the [[Applicationapplication programming interface|APIs]] of those systems use the character to denote the actual end of a file.
 
Some programming languages (e.g. [[Visual Basic]]) will not read past a "soft" EOF when using the built-in text file reading primitives (INPUT, LINE INPUT etc.),{{citation needed|date=October 2023}} and alternate methods must be adopted, e.g. opening the file in binary mode or using the File System Object to progress beyond it.
 
Character 26 was used to mark "End of file" even ifthough the [[ASCII]] calls itthis character Substitute, and has other characters forto thisindicate "End of file". Number 28 which is called "[[File Separator]]" has also been used for similar purposes.
 
===Other uses===
In [[Unix]]-like operating systems, this character is typically used in [[Shell (computing)|shell]]s as a way for the user to [[SIGTSTP|suspend]] the currently executing interactive process.<ref name="U of WashingtonUW_Unix">{{cite web|title=Quick Reference: Unix Commands|url=http://www.washington.edu/computing/unix/unixqr.html|work=IT Connect|publisher=University of Washington|accessdate=2 June 2012}}</ref> The suspended process can then be resumed in ''foreground'' (interactive) mode, or be made to resume execution in ''[[background process|background]]'' mode, or be [[exit (operatingsystem systemcall)|terminated]]. When entered by a user at their [[computer terminal]], the currently running foreground process is sent a "terminal stop" ([[SIGTSTP]]) signal, which generally causes the process to suspend its execution. The user can later continue the process execution by using the "foreground" command (<code>[[fg (Unix)|fg]]</code>) or the "[[Backgroundbackground (computer software)process|background]]" command (<code>[[bg (Unix)|bg]]</code>).
 
The [http://unicode.org/reports/tr36/#Text_Comparison Unicode Security Considerations report]<ref name="Unicode_USC"/> recommends this character as a safe replacement for unmappable characters during character set conversion.
 
In many GUIs and applications, {{key presskeypress|Control|Z}} ({{key presskeypress|Command|Z}} on Mac OS[[macOS]]) can be used to [[undo]] the last action. In many applications, earlier actions than the last one can also be undone by pressing {{key presskeypress|Control|Z}} multiple times. {{key presskeypress|Control|Z}} was one of a handful of [[computer keyboard|keyboard]] sequences chosen by the program designers at [[Xerox PARC]] to control [[text editor|text editing]]. Presumably these particular [[keystroke]]s were chosen because of their location on a standard [[QWERTY keyboard]], since the Z (undo), [[control-X|X]] (cut), [[control-C|C]] (copy), and [[control-V|V]] (paste) keys are located together at the left end of the bottom row of the standard QWERTY keyboard.
 
== Representation ==
* [[ASCII]] and [[Unicode]] representation of "substitute":
*: Octal code: 32
*: Decimal code: 26
*: Hexadecimal code: 1A, U+001A
*: Mnemonic symbol: SUB
*: Binary value: 11010
 
==References==
{{reflist}}
* [[Federal Standard 1037C]]
 
==See also==
* [[C0 and C1 control codes]] ([[ISO 646]])
* [[U+FFFD]] (Unicode replacement character �)
* [[Access key]]
* [[Control-C]]
Line 44 ⟶ 46:
* [[Control-\]]
* [[Keyboard shortcut]]
* [[List of file signatures]]
* {{mono|[[.notdef]]}}, a symbol (sometimes called by the slang term ''tofu'') used to represent a missing character
** [[Noto fonts]], a Google project to eliminate missing characters
 
==References==
{{reflist}}|refs=
<ref name="DRI_1979_CPM20-IG">{{cite book |title=CP/M 2.0 Interface Guide |chapter=2. Operating System Call Conventions |date=1979 |edition=1 |publisher=[[Digital Research]] |location=Pacific Grove, California, USA |page=5 |url=http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf |access-date=2020-02-28 |url-status=live |archive-url=https://web.archive.org/web/20200228175812/http://bitsavers.org/pdf/digitalResearch/cpm/2.0/CPM_2_0_Interface_Guide_1979.pdf |archive-date=2020-02-28 |quote=[...] The end of an [[ASCII]] file is denoted by a [[control-Z]] character (1AH) or a real end of file, returned by the [[CP/M]] read operation. Control-Z characters embedded within machine code files (e.g., [[COM file]]s) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. [...]}} (56 pages)</ref>
<ref name="Hogan_1982_CP/M">{{cite book |title=Osborne CP/M User Guide - For All CP/M Users |chapter=3. CP/M Transient Commands |author-first=Thom |author-last=Hogan |publisher=[[A. Osborne/McGraw-Hill]] |date=1982 |edition=2 |location=Berkeley, California, USA <!-- |lccn=87-65432??? --> |isbn=0-931988-82-9 |page=[https://archive.org/details/osborne-cpm-users-guide_2nd-ed/page/n87 74] |url=https://archive.org/details/osborne-cpm-users-guide_2nd-ed |access-date=2020-02-28 |quote=[...] [[CP/M]] marks the end of an [[ASCII]] file by placing a [[CONTROL-z]] character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the [[end-of-file marker]] is possible because CONTROL-z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. [...]}} [https://archive.org/stream/osborne-cpm-users-guide_2nd-ed/OsborneCpmUsersGuideSecondEdition_djvu.txt][https://archive.org/download/osborne-cpm-users-guide_2nd-ed/OsborneCpmUsersGuideSecondEdition.pdf]</ref>
<ref name="DEC_1965_PDP-6">{{cite book |title=PDP-6 Multiprogramming System Manual |chapter=Table of IO Device Characteristics - Console or Teletypewriters |id=DEC-6-0-EX-SYS-UM-IP-PRE00 |publisher=[[Digital Equipment Corporation]] (DEC) |publication-place=Maynard, Massachusetts, USA |date=1965 |page=43 |url=http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |access-date=2014-07-10 |url-status=live |archive-url=https://web.archive.org/web/20140714140253/http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |archive-date=2014-07-14}} (1+84+10 pages)</ref>
<ref name="DEC_1969_PDP-10">{{cite book |title=PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors |volume=3 |chapter=5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line) |publisher=[[Digital Equipment Corporation]] (DEC) |date=1969 |pages=5-3 – 5-6 [5-5 (431)] |url=http://bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |access-date=2014-07-10 |url-status=live |archive-url=https://web.archive.org/web/20111115083418/http://www.bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |archive-date=2011-11-15}} (207 pages)</ref>
<ref name="Microsoft_126449">{{cite web |title=Keyboard shortcuts for Windows |work=Microsoft Support |publisher=[[Microsoft]] |url=http://support.microsoft.com/kb/126449 |access-date=2012-06-02}}</ref>
<ref name="Elliott_1998_CPM14">{{cite web |author-first=John C. |author-last=Elliott |date=1998 |title=CP/M 1.4 disc formats |url=http://www.seasip.info/Cpm/format14.html |access-date=2021-11-18 |url-status=live |archive-url=https://web.archive.org/web/20201114231913/http://www.seasip.info/Cpm/format14.html |archive-date=2020-11-14}}</ref>
<ref name="Elliott_1998_CPM22">{{cite web |author-first=John C. |author-last=Elliott |date=1998 |title=CP/M 2.2 disc formats |url=http://www.seasip.info/Cpm/format22.html |access-date=2021-11-18 |url-status=live |archive-url=https://web.archive.org/web/20201105204828/http://www.seasip.info/Cpm/format22.html |archive-date=2020-11-05}}</ref>
<ref name="Elliott_1998_CPM31">{{cite web |author-first=John C. |author-last=Elliott |date=1998 |title=CP/M 3.1 disc formats |url=http://www.seasip.info/Cpm/format31.html |access-date=2021-11-18 |url-status=live |archive-url=https://web.archive.org/web/20211026154048/https://www.seasip.info/Cpm/format31.html |archive-date=2021-10-26}}</ref>
<ref name="Elliott_1998_DOSPLUS">{{cite web |author-first=John C. |author-last=Elliott |date=1998 |title=CP/M 4.1 disc formats |url=http://www.seasip.info/Cpm/format41.html |access-date=2021-11-18 |url-status=live |archive-url=https://web.archive.org/web/20201105174304/http://www.seasip.info/Cpm/format41.html |archive-date=2020-11-05}}</ref>
<ref name="UW_Unix">{{cite web |title=Quick Reference: Unix Commands |work=IT Connect |publisher=[[University of Washington]] |url=http://www.washington.edu/computing/unix/unixqr.html |access-date=2012-06-02}}</ref>
<ref name="Mastpoint_2016_CSV-1203">[http://www.mastpoint.com/csv-1203 CSV-1203 format specification] {{Webarchive|url=http://arquivo.pt/wayback/20160516100434/http://www.mastpoint.com/csv-1203 |date=2016-05-16}}</ref>
<ref name="Unicode_USC">[http://unicode.org/reports/tr36/#Text_Comparison Unicode Security Considerations report]</ref>
}}
 
==Further reading==
* [[Federal Standard 1037C]]
 
[[Category:Control characters]]