Page 1 of 1

Mailman: Character encoding issue?

Posted: Thu Oct 31, 2013 7:41 am
by jim
Does anyone have any idea why a message arriving (from Poland) intact in Mailman
(VERSION 1, below) would have "! " inserted in two places when viewed in a US
e-mail client (Outlook, for example).

VERSION 1 was copied and pasted from the listerve archive ... it looks intact
without the "! " characters seen in VERSION 2.

VERSION 2 was copied from my Outlook mailbox. This is NOT a local issue
(i.e., something I'm doing with Outlook), several people (including the
author in Poland) asked me why the message contained the "! " characters
AFTER it had been distributed by Mailman.

The "error" appears in two locations:

1. "Zakopane, Polan! d." (at the end of the first paragraph), and
2. "these da! tes on" (near the end of the second paragraph.

At first I assumed it was a character encoding issue, but now I'm not so sure.
Is Mailman masking the "real" character received from the Polish source, but
distributing a "non-standard" character to US (and other) mail clients?

Does anyone have any idea why this is happening? If so, please suggest
a solution--if there is one. Thanks for the help.

******************************************************************VERSION 1
Dear Friends and Colleagues,

Attached you may find the First Announcement of the 31st International
Congress on High-Speed Imaging and Photonics (ICHSIP31) to be held on 6-11
October 2014 in Zakopane, Poland.

If you are interested in attending or giving a presentation at ICHSIP31,
please mark these dates on your calendar.

We are going to inform you by the next announcements as more information
becomes available - this applies particularly to launch the ICHSIP31
website.

With best regards

Krzysztof Tomaszewski

ICHSIP31 Co-Chair



ACS Laboratory, ACS Group

23 Hery Street

01-497 Warsaw, POLAND

Phone:

e-mail:

******************************************************************VERSION 2
Dear Friends and Colleagues,

Attached you may find the First Announcement of the 31st International Congress on High-Speed Imaging and Photonics (ICHSIP31) to be held on 6-11 October 2014 in Zakopane, Polan! d.

If you are interested in attending or giving a presentation at ICHSIP31, please mark these da! tes on your calendar.

We are going to inform you by the next announcements as more information becomes available – this applies particularly to launch the ICHSIP31 website.

With best regards

Krzysztof Tomaszewski

ICHSIP31 Co-Chair



ACS Laboratory, ACS Group

23 Hery Street

01-497 Warsaw, POLAND

Phone:

e-mail:
******************************************************************

Re: Mailman: Character encoding issue?

Posted: Thu Oct 31, 2013 12:53 pm
by bobrk
I see this all over the place, on the web and in emails. I don't think it has anything to do with Poland.

Re: Mailman: Character encoding issue?

Posted: Thu Oct 31, 2013 1:29 pm
by jim
Thanks, bobrk. I'm inclined to agree with you. I think it has something to do with the structure of the next--new line, special characters, a character buffer overflow, ... However, I thought the fact that the message came from Poland (where they employ different character sets) was relevant and should be included. Again, thanks.

Re: Mailman: Character encoding issue?

Posted: Mon Nov 04, 2013 2:21 pm
by anatola2
An American board member on one of mailman forums, has the extra spaces on all of her signature files. She uses Yahoo mail typically, and I've found that Yahoo Mail is very problematic on other forums that require plain text.

I suggested that when she composed her template for the signature file, that instead of hitting 'enter' to start a new line, to hit 'enter-shift' instead, and as long as she remembered to do this, it seemed to format her siglines okay. However she has a constant problem with Yahoo mail reverting to rtf/html formatting, and it turned out to be more trouble for her than it was worth. So her sigs are always double spaced.

Re: Mailman: Character encoding issue?

Posted: Mon Nov 04, 2013 3:10 pm
by Guest
Thanks anatola2. I'm familiar with the extra "new line" characters often inserted into plain text when messages are converted from html. What I find strange about this particular sequence is their location ... in the middle of words, where the author of the message would not have inserted new line characters--that's what led me to think it might be a buffer overflow (i.e., too many characters in a line), but if that were the case I would expect consistency and the glitches to appear at a particular character count (255, for example).

Obviously, this is not a critical issue, but I'd like to understand the root of the problem. I've not seen the same issue in other messages sent by Mailman--and I've managed several lists for almost 10 years.

Any other suggestions? And thanks again.

Re: Mailman: Character encoding issue?

Posted: Wed Nov 06, 2013 4:16 pm
by anatola2
Jim, I understand that it is not critical. It's a major annoyance and sometimes actually causes some strife depending on the context.

I just understand it as general weirdness. It seems to be everywhere, and even worse when people using HTML as their mail reader, copy from rtf or HTML mail.

When I saw the double spaced sig line, I just wrote what popped into my mind. That wasn't helpful. The exclamations to me seem random and likely to occur to some underlying message coding (sometimes full justification and other stuff like that).

But I have seen on various lists where people who write using apple mail, MSN, earthlink, yahoo, tend to have the "postests with the mostests" weirdness going on.

Pasted links from the usual suspects will be broken up so they are like http://w ww.mypersn aldo main.co m then you get a rash of complaints that the link 'doesn't work'.

And rtf/html lines that do not have a true plain text soft carriage return before 80 characters will have = signs, and other stray characters (AO= etc) added at the end of eac=
h line sometimes in the middle of words. (example)

Some posters use a -full- justification, have breaks in the middle of words, even on the shortests of lines (less than ten characters) and sometimes stray symbols added (similar to the Polish example you give.

I did not see full headers of the Polish post, which would normally show the software and the name of its proprietary coding format and actually would not able to decipher it suggesting a solution for the Polish guy.

I don't have an answer for you. I usually tell people who DO want to fix the problem, to get a google mail account and use plain text with utf-8 setting, and then it ends there for that person, and for people who copy and paste or quote their materials back in discussion.

Do you know what reader he uses? These people posting in English using the usual software suspects mentioned above, may quote the first rtf message in a reply using another reader (using rtf or html) and then members of the forum can't reply because there is so much gibberish quoted back when they hit reply. Clean up takes too much time.

This part doesn't have to do with email, but I'm also aware from academic materials I get abroad (papers written for peer review), that sometimes people will save an English PDF in their native (European enabled Adobe PDF) tool, and recirculate the PDF to peers. What then happens is that sometimes certain punctuation or mathematical symbols will change from the English one, to a special character such as the symbol for sigma, or german typesetting for native pronunciation. The latter thing sometimes seems random and maybe a system reboot done before the initial 'save' of the original paper, might have prevented the strange glitch.

So basically it can be a tower of Babel, even if the text if full English or if the text is copied from full english using the English language abilities in foreign langugage European software,

The long way of saying it's actually normal these days. Asking people to abandon their favorite mail reader usually doesn't work. (sorry for the length of this)

Janice