You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					66 lines
				
				3.1 KiB
			
		
		
			
		
	
	
					66 lines
				
				3.1 KiB
			| 
								 
											3 years ago
										 
									 | 
							
								### Javascript porting of Markus Kuhn's wcwidth() implementation
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following explanation comes from the original C implementation:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This is an implementation of wcwidth() and wcswidth() (defined in
							 | 
						||
| 
								 | 
							
								IEEE Std 1002.1-2001) for Unicode.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
							 | 
						||
| 
								 | 
							
								http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In fixed-width output devices, Latin characters all occupy a single
							 | 
						||
| 
								 | 
							
								"cell" position of equal width, whereas ideographic CJK characters
							 | 
						||
| 
								 | 
							
								occupy two such cells. Interoperability between terminal-line
							 | 
						||
| 
								 | 
							
								applications and (teletype-style) character terminals using the
							 | 
						||
| 
								 | 
							
								UTF-8 encoding requires agreement on which character should advance
							 | 
						||
| 
								 | 
							
								the cursor by how many cell positions. No established formal
							 | 
						||
| 
								 | 
							
								standards exist at present on which Unicode character shall occupy
							 | 
						||
| 
								 | 
							
								how many cell positions on character terminals. These routines are
							 | 
						||
| 
								 | 
							
								a first attempt of defining such behavior based on simple rules
							 | 
						||
| 
								 | 
							
								applied to data provided by the Unicode Consortium.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For some graphical characters, the Unicode standard explicitly
							 | 
						||
| 
								 | 
							
								defines a character-cell width via the definition of the East Asian
							 | 
						||
| 
								 | 
							
								FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
							 | 
						||
| 
								 | 
							
								In all these cases, there is no ambiguity about which width a
							 | 
						||
| 
								 | 
							
								terminal shall use. For characters in the East Asian Ambiguous (A)
							 | 
						||
| 
								 | 
							
								class, the width choice depends purely on a preference of backward
							 | 
						||
| 
								 | 
							
								compatibility with either historic CJK or Western practice.
							 | 
						||
| 
								 | 
							
								Choosing single-width for these characters is easy to justify as
							 | 
						||
| 
								 | 
							
								the appropriate long-term solution, as the CJK practice of
							 | 
						||
| 
								 | 
							
								displaying these characters as double-width comes from historic
							 | 
						||
| 
								 | 
							
								implementation simplicity (8-bit encoded characters were displayed
							 | 
						||
| 
								 | 
							
								single-width and 16-bit ones double-width, even for Greek,
							 | 
						||
| 
								 | 
							
								Cyrillic, etc.) and not any typographic considerations.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Much less clear is the choice of width for the Not East Asian
							 | 
						||
| 
								 | 
							
								(Neutral) class. Existing practice does not dictate a width for any
							 | 
						||
| 
								 | 
							
								of these characters. It would nevertheless make sense
							 | 
						||
| 
								 | 
							
								typographically to allocate two character cells to characters such
							 | 
						||
| 
								 | 
							
								as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
							 | 
						||
| 
								 | 
							
								represented adequately with a single-width glyph. The following
							 | 
						||
| 
								 | 
							
								routines at present merely assign a single-cell width to all
							 | 
						||
| 
								 | 
							
								neutral characters, in the interest of simplicity. This is not
							 | 
						||
| 
								 | 
							
								entirely satisfactory and should be reconsidered before
							 | 
						||
| 
								 | 
							
								establishing a formal standard in this area. At the moment, the
							 | 
						||
| 
								 | 
							
								decision which Not East Asian (Neutral) characters should be
							 | 
						||
| 
								 | 
							
								represented by double-width glyphs cannot yet be answered by
							 | 
						||
| 
								 | 
							
								applying a simple rule from the Unicode database content. Setting
							 | 
						||
| 
								 | 
							
								up a proper standard for the behavior of UTF-8 character terminals
							 | 
						||
| 
								 | 
							
								will require a careful analysis not only of each Unicode character,
							 | 
						||
| 
								 | 
							
								but also of each presentation form, something the author of these
							 | 
						||
| 
								 | 
							
								routines has avoided to do so far.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								http://www.unicode.org/unicode/reports/tr11/
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Markus Kuhn -- 2007-05-26 (Unicode 5.0)
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Permission to use, copy, modify, and distribute this software
							 | 
						||
| 
								 | 
							
								for any purpose and without fee is hereby granted. The author
							 | 
						||
| 
								 | 
							
								disclaims all warranties with regard to this software.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 |