Ruby invalid utf 8 add to top of file utf năm 2024

I am getting the error message below and want to find out where the invalid byte sequence is, so I can fix it.

The obvious solution is to trace the input lines being read, by printing them out as they are read. Neither the -trace option or the --verbose option have this effect.

Does anybody have any suggestions for locating the offending bytes in my asciidoc input file? I am running the latest release, downloaded from github today.

Obviously this is also a bug in Asciidoctor, it should either complain about the byte or handle it.

/usr1/expsrc/asciidoctor-master/bin> ./asciidoctor /usr1/rbook/T/twords/rbook.txt --trace --verbose --safe-mode secure /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:726:in `=~': invalid byte sequence in UTF-8 (ArgumentError) from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:726:in `block in next_block' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/reader.rb:454:in `read_lines_until' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:716:in `next_block' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:303:in `next_section' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:291:in `next_section' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:291:in `next_section' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/parser.rb:52:in `parse' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/document.rb:448:in `parse' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor.rb:1337:in `load' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor.rb:1415:in `convert' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/cli/invoker.rb:93:in `block in invoke!' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/cli/invoker.rb:85:in `each' from /usr1/expsrc/asciidoctor-master/lib/asciidoctor/cli/invoker.rb:85:in `invoke!' from ./asciidoctor:10:in `

'

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

This post was updated on .

With latest Asciidoctor v1.5.0 I get the same errror: "invalid byte sequence error in UTF-8" when I use german umlauts like "äüö" in my *adoc files.

Asciidoctor does not compile this very basic Example.adoc and commits the error message above. So there's no way to compile any Asciidoc file with german umlauts to html. That's really annoying.

My system is WIndows 7 64 bit german version with Ruby v1.9.3p545.

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Do you have test document, an actual file, not just the contents?

On Sunday, August 24, 2014, Chris [via Asciidoctor :: Discussion] <[hidden email]> wrote:

With latest Asciidoctor v1.5.0 I get the same errror: "invalid byte sequence error in UTF-8" when I use german umlauts like "äüö" in my *adoc files.

Example.adoc:

Test äöü

Asciidoctor does not compile this very basic Example.adoc and commits the error message above. So there's no way to compile any Asciidoc file with german umlauts to html. That's really annoying.

My system is WIndows 7 64 bit german version.


To start a new topic under Asciidoctor :: Discussion, email ml-node+s49171n1h37@... To unsubscribe from Asciidoctor :: Discussion, click here. NAML

--

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Huh. Not seeing the links. Oh well. Anyway, which version of ruby are you using? The second error is definitely the wrong BOM as the first character in the file. The first error, how you are able to have two encodings in the same file is odd.

On Sunday, August 24, 2014, Chris [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hi Jason,

now there are two download links in my first post above with following example files:

Sublime Text 3: sublime_text.adoc -> asciidoctor v1.5.0 error message: "incompatible character encodings: UTF-8 and US-ASCII"

Windows 7 Editor (Notepad): windows_notepad.adoc -> asciidoctor v1.5.0 error message: "invalid byte sequence in UTF-8"


To start a new topic under Asciidoctor :: Discussion, email ml-node+s49171n1h37@... To unsubscribe from Asciidoctor :: Discussion, click here. NAML

--

I want to be absolutely clear, because there's a lot of potential confusion around this subject. Asciidoctor fully supports UTF-8 and thus the entire set of characters defined by the Unicode specification (in other words, all characters).

When Asciidoctor has problems processing documents, it's a problem that is inherited from the misunderstanding between Ruby and the operating system.

We are at a point in global technology where all systems should be using UTF-8 (or UTF-16) by default. Linux has supported this mode for nearly a decade, if not more. Unfortunately, Windows seems to be stubborn about this topic and insists on defaulting to regional charsets. Since you're using a German version of Windows, and you are getting this error, I'm fairly certain your system charset is not configured as UTF-8.

Unfortunately, there's no (easy) way for Ruby to know that it's not getting a UTF-8 document, or that the system is not in UTF-8. To make matters more complicated, Ruby seems to be configured differently on Windows than it is on other operating systems. It's extremely rare that you would see this error on Linux, if at all.

To move forward, what we need to understand is what flags need to be set to get all the parts playing in the same UTF-8 sandbox. To start, make sure you save your text files using UTF-8 encoding. I think it's bad practice in the modern era to save text files any other way, thus I want to stay away from input / output encoding settings in Asciidoctor.

Once your document is encoding in UTF-8 (or it already is), then we need to get into the business of figuring out what settings we need to document so that Ruby is reading and writing the file as UTF-8, even if the system is not set to a UTF-8 locale. If we can get this properly documented, we should be able to confidently handle these types of problems in the future.

To close off this reply, I want to emphasize again that there is no code inside of Asciidoctor that would affect the processing of these characters. Asciidoctor assumes it's reading UTF-8 source and it writes UTF-8 output.

One way or another, we'll get this sorted out for sure!

On Sun, Aug 24, 2014 at 12:22 AM, Chris [via Asciidoctor :: Discussion] <[hidden email]> wrote:

With latest Asciidoctor v1.5.0 I get the same errror: "invalid byte sequence error in UTF-8" when I use german umlauts like "äüö" in my *adoc files.

Example.adoc:

Test äöü

Asciidoctor does not compile this very basic Example.adoc and commits the error message above. So there's no way to compile any Asciidoc file with german umlauts to html. That's really annoying.

My system is WIndows 7 64 bit german version.

---

--

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

In reply to this post by LightGuardjp

The links are in my second post.

Ruby v1.9.3p545 is installed on my Windows 7 64 bit (german) system.

1. "sublime_text.adoc" was created with "Sublime Text 3" 2. "windows_notepad.adoc" was created with Windows 7 Editor (Notepad)

Windows 7 Editor (Notepad) is on every Windows System and Sublime Text 3 is an often used Editor on Windows, Mac and Linux.

Is this an Editor related error? Do you know an Editor for Windows which will work with Asciidoctor without errors?

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

In reply to this post by LightGuardjp

Chris and Jason,

The problem isn't necessarily the files themselves, it's the file + the system it's being processed on. Jason, if you downloaded this file and tried it, you won't get the same results as Chris because you are not running the German version of Windows (or Ruby on Windows for that matter).

However, what we do what to establish first and foremost is that the file is being saved with UTF-8 encoding. That is a prerequisite for Asciidoctor...because it's hard enough just getting that right we don't want to get into the business of mixing encodings...there's only pain and suffering down that path.

Once we have the document in UTF-8, then we need to make ensure that Ruby is reading and writing it as UTF-8. It should be, but we need to know more about your system.

Can you run:

$ asciidoctor -v

and print the results.

I just thought of something. I should print the system charset in the asciidoctor -v output. That will be very helpful when debugging things. I'll add that feature to master.

-Dan

On Sun, Aug 24, 2014 at 12:47 AM, LightGuardjp [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Huh. Not seeing the links. Oh well. Anyway, which version of ruby are you using? The second error is definitely the wrong BOM as the first character in the file. The first error, how you are able to have two encodings in the same file is odd.

On Sunday, August 24, 2014, Chris [via Asciidoctor :: Discussion] <[hidden email]> wrote:

Hi Jason,

now there are two download links in my first post above with following example files:

Sublime Text 3: sublime_text.adoc -> asciidoctor v1.5.0 error message: "incompatible character encodings: UTF-8 and US-ASCII"

Windows 7 Editor (Notepad): windows_notepad.adoc -> asciidoctor v1.5.0 error message: "invalid byte sequence in UTF-8"

-

To start a new topic under Asciidoctor :: Discussion, email ml-node+s49171n1h37@... To unsubscribe from Asciidoctor :: Discussion, click here. NAML

-

--

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Hi Dan,

asciidoctor -v: Asciidoctor 1.5.0 [http://asciidoctor.org] Runtime Environment (ruby 1.9.3p545 (2014-02-24) [i386-mingw32])

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

Chris,

As I mention in the AsciiDoc Writer's Guide, I strongly recommend against using Notepad. It is a seriously broken program on so many levels. It's true that it's on every Windows system, but gum is on every sidewalk, doesn't mean you should chew it...if you know what I'm saying :)

Other Windows users typically recommend Notepad++. However, I find it to be a cluttered user interface. These days, I'm recommending the Atom editor developed by GitHub and community. Atom is cross platform, it uses modern web technologies under the hood and it even has an AsciiDoc preview plugin based on Asciidoctor!

You won't be disappointed. The one catch is that you have to build it on Windows to install it, but there are instructions that hopefully make that reasonably straightforward.

One more thing, could you run the following command and paste the output.

$ ruby -e 'puts [Encoding.default_external,Encoding.default_internal,"".encoding,__ENCODING__] * ","'

If you get an error, try this one:

$ ruby -e 'puts [Encoding.default_external,Encoding.default_internal,"".encoding] * ","'

-Dan

On Sun, Aug 24, 2014 at 12:56 AM, Chris [via Asciidoctor :: Discussion] <[hidden email]> wrote:

The links are in my second post.

1. "sublime_text.adoc" was created with "Sublime Text 3" 2. "windows_notepad.adoc" was created with Windows 7 Editor (Notepad)

Windows 7 Editor (Notepad) is on every Windows System and Sublime Text 3 is an often used Editor on Windows, Mac and Linux.

Is this an Editor related error? Do you know an Editor for Windows which will work with Asciidoctor without errors?

---

--

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

If Atom doesn't work for you, try Brackets. It's very similar to Atom, and also has AsciiDoc support, except it also has a Windows MSI installer :)

-Dan

On Sun, Aug 24, 2014 at 1:10 AM, Dan Allen <[hidden email]> wrote:

Chris,

As I mention in the AsciiDoc Writer's Guide, I strongly recommend against using Notepad. It is a seriously broken program on so many levels. It's true that it's on every Windows system, but gum is on every sidewalk, doesn't mean you should chew it...if you know what I'm saying :)

Other Windows users typically recommend Notepad++. However, I find it to be a cluttered user interface. These days, I'm recommending the Atom editor developed by GitHub and community. Atom is cross platform, it uses modern web technologies under the hood and it even has an AsciiDoc preview plugin based on Asciidoctor!

You won't be disappointed. The one catch is that you have to build it on Windows to install it, but there are instructions that hopefully make that reasonably straightforward.

One more thing, could you run the following command and paste the output.

$ ruby -e 'puts [Encoding.default_external,Encoding.default_internal,"".encoding,__ENCODING__] * ","'

If you get an error, try this one:

$ ruby -e 'puts [Encoding.default_external,Encoding.default_internal,"".encoding] * ","'

-Dan

--

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

Keep in mind that Asciidoctor supports Ruby 1.8.7 and above with no problem. It's just that Ruby 1.9.3 brings it's own baggage...and Asciidoctor inherits that. Ruby 1.9.3 works just find when everything is UTF-8, including the system. When it's not, things go sideways. It was because of these problems that the Ruby developers learned about encoding and fixed their ways in Ruby 2.

-Dan

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

ruby -e 'puts [Encoding.default_external,Encoding.default_internal,"".encoding,__ENCODING__] * ","' :

CP850,,CP850,CP850

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

Encoding.default_external = "UTF-8"

To the top of the asciidoctor.rb script. I had considered doing this at some point, but it's not recommended that gems mess with this setting. Perhaps I can put it in the asciidoctor command though.

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

This post was updated on .

I followed your recommendation and updated to Ruby v2.0.0p481 via the windows installer (http://rubyinstaller.org).

After that the Sublime Text 3 Asciidoctor file (sublime_text.adoc) compiles without an error!

Ruby invalid utf 8 add to top of file utf năm 2024

The windows_notepad.adoc example file got the same error message as before but I don't care because I don't use that crappy editor anyway with Asciidoctor.

Many thanks for your help Dan!

Ruby invalid utf 8 add to top of file utf năm 2024

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

The only small error which is left is shown in my Firefox browser. The Asciidoctor example file compiled to html shows:

Test äöü

Last updated 2014-08-24 08:36:42 Mitteleuropõische Sommerzeit

In the last line it should say "Mitteleuropäische Sommerzeit", not "Mitteleuropõische". Don't know what kind of issue that is.

Ruby invalid utf 8 add to top of file utf năm 2024
Ruby invalid utf 8 add to top of file utf năm 2024

Administrator

\o/

I'll be sure to do some testing on a German version of Windows so I'm well informed about the circumstances. What's important is that you can proceed!!

How to fix UTF

Fix UTF-8 CSV Encoding Errors.

Click Choose File->Save As from the menu..

In the "Save as type" dropdown, select. Comma Separated Values (*. csv)..

Select Web Options from the Tools... dropdown at the bottom of the dialog box..

Select the Encoding tab..

In the "Save this document as:" dropdown, select Unicode (UTF-8)..

Why is my file not UTF

UTF-8 is the dominant character encoding format on the Internet. This error occurs because the software you use encodes the file in a different format, such as ISO-8859 , instead of UTF-8 . There are different solutions you can use to change your file to UTF-8 encoding: Gmail or Google Drive.

What is invalid UTF

When data is processed or stored using different character encodings, it can lead to invalid UTF-8 encoding. For example, if data is originally encoded in ISO-8859-1 (Latin-1) and then treated as UTF-8, characters that are not representable in both encodings can result in encoding issues.

What is an invalid byte sequence in UTF

Sometimes when importing a CSV, you may run into an error message like this. This error typically happens when the file you are attempting to upload is not in a UTF-8 format.