You are here

Notepad++ Pasting Bug Found

19 posts / 0 new
Last post
truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
Notepad++ Pasting Bug Found

As I have tried to tell people before, there is a paste bug in Notepad++, and Notepad++ adds "?" character to pasted text.

Try this and see for yourself. Go to any youtube video, and then highlight and copy the entire text message posted by someone. Then paste it into a notepad++ text file and notice that "?" characters will appear randomly in the text.

But interestingly, some text pasts OK, but some add a random ?

So try a few different comments and copy and paste them.

agdurrette
Offline
Last seen: 1 month 1 week ago
Developer
Joined: 2008-01-16 13:55
That would be Youtube's

That would be Youtube's fault. In windows notepad I get a box and a local install of notepad++ I do get a "?". Its not just Portable Notepad++.

"It's just an online installer. It's not going to mug you.", JTH
"The shell is the key to unlock Linux's greatest advantages."

Mark Sikkema
Offline
Last seen: 13 years 5 months ago
Developer
Joined: 2009-07-20 14:55
Change the enconding to anything but ANSI !

Just make sure Notepad++ it's encoding is not set to ANSI Blum
With ANSI you're bound to get question-marks for any 'special' character.

Formerly Gringoloco
Windows XP Pro sp3 x32

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
Where do you turn off ANSI?

Instead of ANSI, which one do you select?

I tried UTF-8 and UTF-8 without bom, but the "?" still appears as I copy and paste youtube comments.

But I am trying UCS-2 Little Endian, and I think that may have solved it. Will test some more to find out for sure.

John T. Haller
John T. Haller's picture
Offline
Last seen: 2 hours 7 min ago
AdminDeveloperModeratorTranslator
Joined: 2005-11-28 22:21
Unicode Characters

Ok, I just copy and pasted multiple YouTube comments and no special characters or question marks show up. So it is dependent on WHAT SPECIFICALLY you are selecting.

If you select something that has a character outside of the current character set, it will show up as a ? in Notepad++ just like it will show up as a box in Notepad. That means character can't be displayed.

You can set Encoding to UCS-2 Little Endian and it should be able to handle anything. If it can't, it's YouTube's fault for having random characters in it, not Notepad++'s. Or your browser's, since it is passing the string off to the clipboard. Notepad++ will just dump whatever is in the clipboard into the document and show any characters it can't display as ?. It's doing just what it's supposed to.

Sometimes, the impossible can become possible, if you're awesome!

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
Others are experiencing the

Others are experiencing the same problems as me. A question mark(s) appear randomly. But not with all youtube messages that are copied and pasted, but a lot of them.

But what's strange is someone can type this on youtube for example.. "The abba song I heard was great".

And when I copy and paste that text into notepad++, it will appear as, "The abba? song I heard? was great"

The UCS-2 Little Endian, causes another problem. It places a small minus sign at the end of all pasted text. Not all the time, but sometimes. But if it solves the ? marks appearing in body of pasted text, then I will continue using UCS-2 Little Endian.

John T. Haller
John T. Haller's picture
Offline
Last seen: 2 hours 7 min ago
AdminDeveloperModeratorTranslator
Joined: 2005-11-28 22:21
Browser

Then it's your browser giving you issues. Either way, it's not an issue with Notepad++, that's just what's in your clipboard, so it pasted it. So, it's not a Notepad++ bug. And absolutely not a Notepad++ Portable bug.

I still can't figure out why anyone would copy and past a YouTube comment. YouTube comments are the sewers of the social internet.

Sometimes, the impossible can become possible, if you're awesome!

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
John, I am using Firefox, so

John, don't get defensive, I am not having a go at your portableapps notepad++. So if I want to copy certain text from youtube, I have a right to do so. There may indeed be many comments on youtube that are from sewers of the social internet, but please stay on topic.

I am using Firefox, so what's wrong with firefox that causes this problem?

Secondly, how come when I paste the EXACT SAME text into Windows notepad and EditPad Lite, there are no "?" marks appearing as they do in notepad++?

How do you explain that, seeing it only happens in notepad++ and not other text editors?

(UPDATE: For the record I want to use notepad++ and not the other text editors because notepad++ remembers and saves all my text files when I reopen it and the others don't. Also, this is not just a portableapps issue I learned, because I downloaded the original installable version from their website, and the "?" issue happens in that version as well.)

J Neutron
Offline
Last seen: 9 months 4 weeks ago
Joined: 2008-06-10 19:26
You are tilting at the wrong windmill

If you want to complain about bugs in Notebook++ and Firefox and YouTube, I would respectfully suggest pursuing your case with the authors of each of those... and not constantly trying to show how PortableApps.com has done something (somehow) wrong.

Honestly, we have an association in Maine called the "Christian Civic League" (check it out on Google) and they are neither christian nor civic at all. It has made me look beyond the actual name and consider the intent.

neutron1132 (at) usa (dot) com

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
I was actually seeking a

I was actually seeking a SOLUTION. I will not pursue lodging bug reports because to be honest, I am too lazy to do so.

And Mark Sikkema may have found it when he said not to use ANSI. So I changed it from ANSI, and I will test out UCS-2 Little Endian and see how it goes.

Mark Sikkema
Offline
Last seen: 13 years 5 months ago
Developer
Joined: 2009-07-20 14:55
All of the unicode encodings

All of the unicode encodings should work and are working fine. (at least on my system).
unicode is utf-8 (with or without the BOM) and ucs-2 LE & BE.
Any of the 'Character sets' are just additional ANSI code-pages.

Formerly Gringoloco
Windows XP Pro sp3 x32

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
UPDATE: I did some tests

UPDATE: I did some tests based on this. And yes, the questions marks are there. When I did the paste to Notepad++, it out blanks. So, using Firefox, I selected the the text in Youtube and did a "view selection source". There was an extra character there. The following is the text of a comment:

Lets see you forgot Jimmi, Eric Johnson, Eric Clapton, Kirk Hammet, you are such a.....

I can see in this edit box a character between the word "you" and "are". But, when it is redisplayed in the preview, the character doesn't show.

Try it yourself, copy that text and then paste it into a notepad++ text file and you will see a character between "you" and "are".

What I would say is happening is that a character is inserted in the text by YouTube. This is interpreted by the browser as a space (or by the character set in use). When the text is cut-and-pasted to Notepad++, it sees the character and can't represent it so it replaces it with a question mark. Notepad decides to display a space but it keeps the character there.

The real question is why does YouTube put the character there? Strange.

depp.jones
Offline
Last seen: 1 hour 11 min ago
DeveloperTranslator
Joined: 2010-06-05 17:19
Unicode

As stated before: Unicode character. Pasting your text to an ANSI document in notepad++ produces a ?, pasting it in a Unicode document produces a small dot (also don't know why youtube uses that). I don't find a bug here.
Notepad++ represents characters not defined by the selected Encoding with a ?, while notepad (and some others) display all Unicode characters by default. Did you try to save the document in notepad as ANSI? It reminds you that some Unicode characters would be lost. Re-open that file and look what you'll find. Wink

yours,
dj

ottosykora
Offline
Last seen: 17 hours 22 min ago
Joined: 2007-10-11 17:48
some idea:

I have no clue in fact why there should be some chars added to some you tube videos, but those who downloaded some of the steganography files from my box.net, there is one in it, demonstrating such use of non displayable chars.
In fact it is possible to add extra special chars which can not be displayed by common programs, or will be simply displayed as empty space or similar 'invisible' char.
By doing that, one can hide an additional information or message inside on first view normal clear text.

So it could be, that you tube feels that some extra special chars, normally not displayed in a browser, should be added, to tell someone else, or you tube itself that this string is originating from this and that, at this and that time etc.

What they did not think of is that some software will not behave like most standard windows viewers and will not display a 'blank' but something else.

Otto Sykora
Basel, Switzerland

Ken Herbert
Ken Herbert's picture
Online
Last seen: 54 min 8 sec ago
DeveloperModerator
Joined: 2010-05-25 18:19
There doesn't even have to be

There doesn't even have to be a hidden message or anything so sinister to it. There are at least eight html entities and at least 50 utf-8 characters that show up in html as a space or even nothing, many of which probably do not translate well to ASCII.

ottosykora
Offline
Last seen: 17 hours 22 min ago
Joined: 2007-10-11 17:48
yes sure

this is what chars I meant. They can be used, since normally not displayed, as bit coded what ever inside the text. So what I meant is that they might be entered there for a very particular purpose, either to be displayed on some particularly set up viewer or contain some additional info. Or simply to make it more difficult for some people to copy it down to other place etc.

Otto Sykora
Basel, Switzerland

SakiTC
SakiTC's picture
Offline
Last seen: 3 years 6 months ago
Joined: 2008-06-13 02:05
Found the character in question

It's U+FEFF: Zero width no-break space.
So, if you want to get rid of those characters in Notepad++, try the following.
First, be sure to paste data in unicode encoding (any one that gave you "dots" and not question marks).
Second, open "Replace" (Ctrl+H). Make sure that "Match whole word only" is not checked and that the "Search mode" is "Extended".
In "Find what" box, write "\uFEFF" (without the quotation marks, of course) and leave the "Replace with" box empty.
After clicking on "Replace all", all the dots should disappear.

Edit: It seems that YouTube adds the character to each comment, adds it only once and does it at the end of a word.

No typin th las lette ca sav yo plent o spac

Moonbase
Offline
Last seen: 11 years 4 weeks ago
Joined: 2010-09-09 06:16
Probably just a transcoding bug

Unfortunately, many libraries (and text parsers) aren’t (yet) handling Unicode Transformation Standards well (like UTF-8, UTF-16, UTF-32), or choke on UCS-2 (which—against wide belief—doesn’t contain all currently defined Unicode characters).

I’ve also seen source code for many "Unicode" handlers, written hastily (and badly) and checking just a few bit patterns ("nobody will ever use these other chracters anyway …").

So it might just be sloppy coding somewhere, a transcoding problem. And I’ve also seen a few of these in Scintilla (the editing component programs like Notepad++ and Notepad2 use).

Until everyone does everything perfectly (hee, hee), we just might have to live with it and edit text manually, at times. I reckon.

truthseeker
truthseeker's picture
Offline
Last seen: 12 years 9 months ago
Joined: 2008-07-30 20:32
Moonbase, yep hehe

Moonbase, yep Wink hehe

Log in or register to post comments