Character encoding detection tool for NodeJS
Go to file
Dmitry Shirokov 9609ab4a7a Merge pull request #2 from spikying/master
declared variable '''' confidence '''' before use, to make it work in 'strict mode' environment.
2015-11-30 08:46:40 +11:00
encoding updated indentation to 2-spaces. Added some {} around if/else 1-liners expressions. 2015-11-29 15:18:49 -05:00
test major style changes - tests 2013-11-22 15:40:19 +11:00
.gitignore minor changes 2013-05-07 15:01:08 +10:00
.travis.yml travis ci, cleanup 2013-05-07 23:27:05 +10:00
LICENSE version bump 2013-05-04 19:33:16 +10:00
README.md syntax highlighting for readme file 2013-11-15 16:08:09 +11:00
index.js major style changes for rest of the files 2013-11-22 15:37:41 +11:00
match.js major style changes for rest of the files 2013-11-22 15:37:41 +11:00
package.json version bump 0.0.8 2013-08-16 14:54:37 +10:00

README.md

chardet Build Status

Chardet is a character detection module for NodeJS written in pure Javascript. Module is based on ICU project http://site.icu-project.org/, which uses character occurency analysis to determine the most probable encoding.

Installation

npm i chardet

Usage

var chardet = require('chardet');
chardet.detect(new Buffer('hello there!'));
// or
chardet.detectFile('/path/to/file', function(err, encoding) {});
// or
chardet.detectFileSync('/path/to/file');

Supported Encodings:

  • UTF-8
  • UTF-16 LE
  • UTF-16 BE
  • UTF-32 LE
  • UTF-32 BE
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • Shift-JIS
  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • windows-1250
  • windows-1251
  • windows-1252
  • windows-1253
  • windows-1254
  • windows-1255
  • windows-1256
  • KOI8-R

Currently only these encodings are supported, more will be added soon.