kmail

#include <encodingdetector.h>

Public Types

enum  EncodingChoiceSource {
  DefaultEncoding , AutoDetectedEncoding , BOM , EncodingFromXMLHeader ,
  EncodingFromMetaTag , EncodingFromHTTPHeader , UserChosenEncoding
}
 
enum  AutoDetectScript {
  None , SemiautomaticDetection , Arabic , Baltic ,
  CentralEuropean , ChineseSimplified , ChineseTraditional , Cyrillic ,
  Greek , Hebrew , Japanese , Korean ,
  NorthernSaami , SouthEasternEurope , Thai , Turkish ,
  Unicode , WesternEuropean
}
 

Public Member Functions

 EncodingDetector ()
 
 EncodingDetector (TQTextCodec *codec, EncodingChoiceSource source, AutoDetectScript script=None)
 
bool setEncoding (const char *encoding, EncodingChoiceSource type)
 
const char * encoding () const
 
bool visuallyOrdered () const
 
void setAutoDetectLanguage (AutoDetectScript)
 
AutoDetectScript autoDetectLanguage () const
 
EncodingChoiceSource encodingChoiceSource () const
 
bool analyze (const char *data, int len)
 
bool analyze (const TQByteArray &data)
 

Static Public Member Functions

static AutoDetectScript scriptForName (const TQString &lang)
 
static TQString nameForScript (AutoDetectScript)
 
static AutoDetectScript scriptForLanguageCode (const TQString &lang)
 
static bool hasAutoDetectionForScript (AutoDetectScript)
 

Protected Member Functions

bool errorsIfUtf8 (const char *data, int length)
 
TQTextDecoder * decoder ()
 

Detailed Description

Provides encoding detection capabilities.

Searches for encoding declaration inside raw data – meta and xml tags. In the case it can't find it, uses heuristics for specified language.

If it finds unicode BOM marks, it changes encoding regardless of what the user has told

Intended lifetime of the object: one instance per document.

Typical use:

TQByteArray data;
...
EncodingDetector detector;
detector.setAutoDetectLanguage(EncodingDetector::Cyrillic);
TQString out=detector.decode(data);

Do not mix decode() with decodeWithBuffering()

Guess encoding of char array

Definition at line 57 of file encodingdetector.h.

Constructor & Destructor Documentation

◆ EncodingDetector() [1/2]

EncodingDetector::EncodingDetector ( )

Default codec is latin1 (as html spec says), EncodingChoiceSource is default, AutoDetectScript=Semiautomatic.

Definition at line 796 of file encodingdetector.cpp.

◆ EncodingDetector() [2/2]

EncodingDetector::EncodingDetector ( TQTextCodec *  codec,
EncodingChoiceSource  source,
AutoDetectScript  script = None 
)

Allows to set Default codec, EncodingChoiceSource, AutoDetectScript.

Definition at line 800 of file encodingdetector.cpp.

Member Function Documentation

◆ analyze() [1/2]

bool EncodingDetector::analyze ( const char *  data,
int  len 
)

Analyze text data.

Returns
true if there was enough data for accurate detection

Definition at line 906 of file encodingdetector.cpp.

◆ analyze() [2/2]

bool EncodingDetector::analyze ( const TQByteArray &  data)

Analyze text data.

Returns
true if there was enough data for accurate detection

Definition at line 901 of file encodingdetector.cpp.

◆ decoder()

TQTextDecoder * EncodingDetector::decoder ( )
protected
Returns
TQTextDecoder for detected encoding

Definition at line 841 of file encodingdetector.cpp.

◆ encoding()

const char * EncodingDetector::encoding ( ) const

Convenience method.

Returns
mime name of detected encoding

Definition at line 824 of file encodingdetector.cpp.

◆ errorsIfUtf8()

bool EncodingDetector::errorsIfUtf8 ( const char *  data,
int  length 
)
protected

Check if we are really utf8.

Taken from kate

Returns
true if current encoding is utf8 and the text cannot be in this encoding

Please somebody read http://de.wikipedia.org/wiki/UTF-8 and check this code...

Definition at line 732 of file encodingdetector.cpp.

◆ scriptForName()

EncodingDetector::AutoDetectScript EncodingDetector::scriptForName ( const TQString &  lang)
static

Takes lang name after it were i18n()'ed.

Definition at line 1166 of file encodingdetector.cpp.

◆ setEncoding()

bool EncodingDetector::setEncoding ( const char *  encoding,
EncodingChoiceSource  type 
)
Returns
true if specified encoding was recognized

Definition at line 846 of file encodingdetector.cpp.


The documentation for this class was generated from the following files: