You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					
					
						
							66 lines
						
					
					
						
							3.1 KiB
						
					
					
				
			
		
		
	
	
							66 lines
						
					
					
						
							3.1 KiB
						
					
					
				| ### Javascript porting of Markus Kuhn's wcwidth() implementation
 | |
| 
 | |
| The following explanation comes from the original C implementation:
 | |
| 
 | |
| This is an implementation of wcwidth() and wcswidth() (defined in
 | |
| IEEE Std 1002.1-2001) for Unicode.
 | |
| 
 | |
| http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
 | |
| http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
 | |
| 
 | |
| In fixed-width output devices, Latin characters all occupy a single
 | |
| "cell" position of equal width, whereas ideographic CJK characters
 | |
| occupy two such cells. Interoperability between terminal-line
 | |
| applications and (teletype-style) character terminals using the
 | |
| UTF-8 encoding requires agreement on which character should advance
 | |
| the cursor by how many cell positions. No established formal
 | |
| standards exist at present on which Unicode character shall occupy
 | |
| how many cell positions on character terminals. These routines are
 | |
| a first attempt of defining such behavior based on simple rules
 | |
| applied to data provided by the Unicode Consortium.
 | |
| 
 | |
| For some graphical characters, the Unicode standard explicitly
 | |
| defines a character-cell width via the definition of the East Asian
 | |
| FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
 | |
| In all these cases, there is no ambiguity about which width a
 | |
| terminal shall use. For characters in the East Asian Ambiguous (A)
 | |
| class, the width choice depends purely on a preference of backward
 | |
| compatibility with either historic CJK or Western practice.
 | |
| Choosing single-width for these characters is easy to justify as
 | |
| the appropriate long-term solution, as the CJK practice of
 | |
| displaying these characters as double-width comes from historic
 | |
| implementation simplicity (8-bit encoded characters were displayed
 | |
| single-width and 16-bit ones double-width, even for Greek,
 | |
| Cyrillic, etc.) and not any typographic considerations.
 | |
| 
 | |
| Much less clear is the choice of width for the Not East Asian
 | |
| (Neutral) class. Existing practice does not dictate a width for any
 | |
| of these characters. It would nevertheless make sense
 | |
| typographically to allocate two character cells to characters such
 | |
| as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
 | |
| represented adequately with a single-width glyph. The following
 | |
| routines at present merely assign a single-cell width to all
 | |
| neutral characters, in the interest of simplicity. This is not
 | |
| entirely satisfactory and should be reconsidered before
 | |
| establishing a formal standard in this area. At the moment, the
 | |
| decision which Not East Asian (Neutral) characters should be
 | |
| represented by double-width glyphs cannot yet be answered by
 | |
| applying a simple rule from the Unicode database content. Setting
 | |
| up a proper standard for the behavior of UTF-8 character terminals
 | |
| will require a careful analysis not only of each Unicode character,
 | |
| but also of each presentation form, something the author of these
 | |
| routines has avoided to do so far.
 | |
| 
 | |
| http://www.unicode.org/unicode/reports/tr11/
 | |
| 
 | |
| Markus Kuhn -- 2007-05-26 (Unicode 5.0)
 | |
| 
 | |
| Permission to use, copy, modify, and distribute this software
 | |
| for any purpose and without fee is hereby granted. The author
 | |
| disclaims all warranties with regard to this software.
 | |
| 
 | |
| Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
 | |
| 
 | |
| 
 | |
| 
 |