ccbd0235cc
Bumps [semantic-release](https://github.com/semantic-release/semantic-release) from 17.1.2 to 17.2.3. - [Release notes](https://github.com/semantic-release/semantic-release/releases) - [Commits](https://github.com/semantic-release/semantic-release/compare/v17.1.2...v17.2.3) Signed-off-by: dependabot[bot] <support@github.com> |
||
---|---|---|
src | ||
.eslintignore | ||
.eslintrc.json | ||
.gitignore | ||
.npmignore | ||
.prettierrc.json | ||
.travis.yml | ||
LICENSE | ||
README.md | ||
jest.config.js | ||
package-lock.json | ||
package.json | ||
renovate.json | ||
tsconfig.json |
README.md
chardet
Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.
- Packed size is only 22 KB
- Works in all environments: Node / Browser / Native
- Works on all platforms: Linux / Mac / Windows
- No dependencies
- No native code / bindings
- 100% written in Typescript
- Extensive code coverage
Installation
npm i chardet
Usage
To return the encoding with the highest confidence:
const chardet = require('chardet');
chardet.detect(Buffer.from('hello there!'));
// or
chardet.detectFile('/path/to/file').then(encoding => console.log(encoding));
// or
chardet.detectFileSync('/path/to/file');
To return the full list of possible encodings use analyse
method.
const chardet = require('chardet');
chardet.analyse(Buffer.from('hello there!'));
Returned value is an array of objects sorted by confidence value in decending order
[
{ confidence: 90, name: 'UTF-8' },
{ confidence: 20, name: 'windows-1252', lang: 'fr' }
];
Working with large data sets
Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:
chardet
.detectFile('/path/to/file', { sampleSize: 32 })
.then(encoding => console.log(encoding));
Supported Encodings:
- UTF-8
- UTF-16 LE
- UTF-16 BE
- UTF-32 LE
- UTF-32 BE
- ISO-2022-JP
- ISO-2022-KR
- ISO-2022-CN
- Shift_JIS
- Big5
- EUC-JP
- EUC-KR
- GB18030
- ISO-8859-1
- ISO-8859-2
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1254
- windows-1255
- windows-1256
- KOI8-R
Currently only these encodings are supported.
Typescript?
Yes. Type definitions are included.
References
- ICU project http://site.icu-project.org/