STREX - Multilingual String Extractor
The latest version : 0.51 (April 2008)
Click here to download the latest version.
Free of charge as long as you use the software as is.
NO WARRANTY.
You shall not modify the software.
STREX can be used to extract strings in a given file, especially used in the Windows environment.
It extracts the strings into an HTML file.
You can see what code page is used and what language is used, if it can successfully guess.
Languages that are not widely used in the Windows environment will not be supported.
What is new or fixed
- version 0.33 (2006/10/31): Added Hebrew recognition. Reduced mis-recognition.
- version 0.34 (2006/11/11): Added Icelandic recognition.
- version 0.35 (2006/12/04): Added Bosnian/Croatian/Serbian recognition.
- version 0.36 (2007/01/31): Added Estonian recognition.
- version 0.37 (2007/02/07): Added Traditional Chinese recognition.
- version 0.38 (2007/03/18): Added Lithuanian recognition.
- version 0.39 (2007/04/02): Added Tagalog (Filipino) recognition.
- version 0.40 (2007/04/29): Added Hindi recognition.
- version 0.41 (2007/06/12): Added Slovene recognition.
- version 0.42 (2007/07/05): Added Persian (Farsi) recognition.
- version 0.43 (2007/08/14): Added Swahili recognition.
- version 0.44 (2007/09/12): Added Ukrainian recognition.
- version 0.45 (2007/10/27): Added Urdu recognition. Deleted UTF-7 from AutoDetect mode.
- version 0.46 (2007/10/30): Added Chinese in Pinyin recognition.
- version 0.47 (2007/11/18): Added Albanian recognition.
- version 0.48 (2007/12/18): Added Slovak recognition.
- version 0.49 (2008/01/25): Added Bengali recognition.
- version 0.50 (2008/02/08): Added Maltese recognition.
- version 0.51 (2008/04/25): Added Latvian recognition.
HOW TO USE
First invoke STREX without any parameters other than the file name.
You will see what code page is used if you see the result.
Then, invoke STREX with a parameter -cp with the code page number to extract all the strings.
If the mode is AUTODETECT, not all the strings are extracted.
For details, invoke STREX with no parameter.
Supported code pages for AUTODETECT mode
- UTF-16 Little Endian
- UTF-16 Big Endian
- UTF-8
- LATIN1 (Windows Code Page 1252)
- Central European (Windows Code Page 1250)
- Cyrillic (Windows Code Page 1251)
- Greek (Windows Code Page 1253)
- Turkish (Windows Code Page 1254)
- Hebrew (Windows Code Page 1255)
- Arabic (Windows Code Page 1256)
- Baltic (Windows Code Page 1257)
- Vietnamese (Windows Code Page 1258)
- Thai (Windows Code Page 874)
- Japanese Shift JIS (Windows Code Page 932)
- Chinese GB 2312 (Windows Code Page 936)
- Korean KS-C (Windows Code Page 949)
- Chinese Big5 (Windows Code Page 950)
- Cyrillic KOI8-RU
Recognized languages : 46 languages
Only major languages are recognized.
Simplified and Traditional Chinese are not distinguished.
- Albanian
- Arabic
- Bengali
- Bosnian
- Bulgarian
- Chinese (Simplified and Traditional)
Chinese in Pinyin (*Without tonic accents)
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Finnish
- French
- German
- Greek
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Italian
- Japanese
- Korean
- Latvian ([NEW])
- Lithuanian
- Malay
- Maltese
- Norwegian
- Persian (Farsi)
- Polish
- Portuguese
- Romanian
- Russian
- Serbian
- Slovak
- Slovene
- Spanish
- Swahili
- Swedish
- Tagalog (Filipino)
- Thai
- Turkish
- Ukrainian
- Urdu
- Vietnamese
Copyright (C) 2006-2008, Masaki Suenaga
All Rights Reserved.
You can link to this page without my permission.
Japanese woodblock printer (artist) Rie Homma's Official Homepage