Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal strings in translation results: ãã£ã£ãç§ã¡ã¡åã #95

Open
gouyuwang opened this issue Feb 21, 2024 · 5 comments
Open

Comments

@gouyuwang
Copy link

gouyuwang commented Feb 21, 2024

Yellow text is the result of a translation error, white text is the original text
bug
The API used is:https://api.deepl.com/v2/translate

@JanEbbing
Copy link
Member

Hi, could you please share the code you are running to get this result, and copy paste the input text instead of posting it as a screenshot? (So I can reproduce the text)
This might be an issue with the encoding used.

@gouyuwang
Copy link
Author

gouyuwang commented Feb 21, 2024

Hi, @JanEbbing

func DeepLTranslate(srcLang, targetLang, text string) (string, error) {
	authKey := ""
	urlValues := url.Values{}
	urlValues.Add("auth_key", authKey)
	urlValues.Add("target_lang", targetLang)
	urlValues.Add("text", text)
	if len(srcLang) > 0 {
		urlValues.Add("source_lang", srcLang)
	}
	resp, err := ctxhttp.PostForm(context.Background(), &http.Client{}, "https://api.deepl.com/v2/translate", urlValues)
	if err != nil {
		return "", err
	}
	defer func(Body io.ReadCloser) {
		err := Body.Close()
		if err != nil {
			fmt.Printf("DeepL translate error: %+v\n", err)
		}
	}(resp.Body)

	if resp.StatusCode != http.StatusOK {
		return "", err
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return "", err
	}

	result := map[string][]map[string]string{}
	err = jsoniter.Unmarshal(body, &result)
	if err != nil {
		return "", err
	}
	return result["translations"][0]["text"], err
}

original text:
物を言ってるような感覚になってしまう でも海外からはアイコンタクトはとても 大事っていう形でまあそこの中でもいろ んなトラブルが起こってきたんですねな ぜかっていうとあの私たちの先生たちは アイコンタクトをちゃんと持ってあの授 業をするわけなんですけれどもある方が ちょっと誤解してしまってそれは恋愛感 情で自分を見つめられてるっていうよう な形からあのちょっと本当に大きなトラ ブルになってきたことがありましたでそ ういうところを通しながらまたある時に はですねあの謝るとか感謝するとかって いうところが日本人は何回も何回もあの するっていう習慣があると思うんですね でそれを例えば1週間後に例えば感謝の 気持ちを表さなかったって言ったら私た ���ちの先生がええあの感謝の気持ちが足り ��ないんじゃないかみたいな形で誤解し����

These texts are presented by OCR and may be different from the real thing。

I wonder if it's caused by illegal strings in the content, such as "?" inside the image.

@JanEbbing
Copy link
Member

From which language to which language are you translating this input? I can translate it fine into british English and Chinese (it puts "���� " at the end, which was present in the source text). I think the most likely culprit is the encoding - our API returns UTF-8 encoded strings, your system may default to a different one when encoding what the API returns, resulting in these weird characters. To fix that, I'd need to know where you translate this (a terminal shell, some webserver, etc), how you display it, etc.

image

@gouyuwang
Copy link
Author

The Golang project is deployed on top of a linux machine located in Hong Kong's server room. The original text is uploaded by the client as a speech stream, and the text is generated by our speech recognition service, and then the DeepL translation interface is called via http, and the translation result is passed back to the chrome client via websocket. The above phenomenon is not a frequent occurrence, about two times in six months. @JanEbbing

@JanEbbing
Copy link
Member

Yes but how/where is the text rendered? As I've shown, the API returns the characters well-formated in UTF-8. If you get artifacts like the ones in the screenshot, it is most likely an encoding issue somewhere in this pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants