Discussion:
[lfs-support] Kernel bug involving physical to virtual remapping
Hazel Russman
2018-07-14 16:56:43 UTC
Permalink
Gentlemen,

I was given your contact details by Michael Shell, who has been helping me to troubleshoot this problem via the Linux From Scratch support list.

For some time now I have been unable to boot recent kernels (4.14 or later) on my rather elderly desktop machine. The kernel panics during boot and the problem seems (superficially) to lie in the acpi driver. At least that is where the visible error messages come from. Booting with "acpi=off" works but is hardly an ideal solution.

However a git bisection showed that this is actually a memory management issue. The kernel commit that caused the problem is :
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().

Reintroducing the code:
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.

If you want me to carry out any further tests, I would be happy to oblige, but do please bear in mind that I am not an expert, so you will need to give fairly basic instructions.

Hazel Russman
--
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thin
Frans de Boer
2018-07-14 20:02:17 UTC
Permalink
Post by Hazel Russman
Gentlemen,
I was given your contact details by Michael Shell, who has been helping me to troubleshoot this problem via the Linux From Scratch support list.
For some time now I have been unable to boot recent kernels (4.14 or later) on my rather elderly desktop machine. The kernel panics during boot and the problem seems (superficially) to lie in the acpi driver. At least that is where the visible error messages come from. Booting with "acpi=off" works but is hardly an ideal solution.
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
If you want me to carry out any further tests, I would be happy to oblige, but do please bear in mind that I am not an expert, so you will need to give fairly basic instructions.
Hazel Russman
Thnx Hazel,

I will try this in the comming days ahead.

--- Frans.
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikiped
Frans de Boer
2018-07-17 12:06:09 UTC
Permalink
Post by Hazel Russman
Gentlemen,
I was given your contact details by Michael Shell, who has been helping me to troubleshoot this problem via the Linux From Scratch support list.
For some time now I have been unable to boot recent kernels (4.14 or later) on my rather elderly desktop machine. The kernel panics during boot and the problem seems (superficially) to lie in the acpi driver. At least that is where the visible error messages come from. Booting with "acpi=off" works but is hardly an ideal solution.
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
If you want me to carry out any further tests, I would be happy to oblige, but do please bear in mind that I am not an expert, so you will need to give fairly basic instructions.
Hazel Russman
Hazel, sorry but where should I remove phys_to_virt()? If I delete the
complete if statement in the iounmap function, and replace that with the
above code, i get compile errors.

btw: acpi=off does not solve the issue too.

Frans.
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org
Michael Shell
2018-07-17 13:02:46 UTC
Permalink
On Tue, 17 Jul 2018 14:06:09 +0200
Post by Frans de Boer
Hazel, sorry but where should I remove phys_to_virt()? If I delete the
complete if statement in the iounmap function, and replace that with the
above code, i get compile errors.
Frans,

You put (do not remove anything) the statement:

if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);

around line 106 of arch/x86/mm/ioremap.c

just before the lines:

/*
* Don't allow anybody to remap normal RAM that we're using..
*/
pfn = phys_addr >> PAGE_SHIFT;


You can how the older code was altered here:

https://patchwork.kernel.org/patch/9847859/


Mike
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikip
Hazel Russman
2018-07-17 13:15:49 UTC
Permalink
On Tue, 17 Jul 2018 14:06:09 +0200
Post by Frans de Boer
Post by Hazel Russman
Gentlemen,
I was given your contact details by Michael Shell, who has been helping me to troubleshoot this problem via the Linux From Scratch support list.
For some time now I have been unable to boot recent kernels (4.14 or later) on my rather elderly desktop machine. The kernel panics during boot and the problem seems (superficially) to lie in the acpi driver. At least that is where the visible error messages come from. Booting with "acpi=off" works but is hardly an ideal solution.
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
If you want me to carry out any further tests, I would be happy to oblige, but do please bear in mind that I am not an expert, so you will need to give fairly basic instructions.
Hazel Russman
Hazel, sorry but where should I remove phys_to_virt()? If I delete the
complete if statement in the iounmap function, and replace that with the
above code, i get compile errors.
btw: acpi=off does not solve the issue too.
Frans.
--
No, it's the other way around. phys_to_virt() doesn't get removed; it gets inserted/reinserted just above the warning not to let normal RAM be remapped. This is code that was in the kernel before but someone took it out and that was what was causing me all that trouble.

Here's the patch that I made:

--- linux-4.13.0-rc1/arch/x86/mm/ioremap.c 2018-07-14 13:27:21.000000000 +0100
+++ linux-4.13.0-rc1.new/arch/x86/mm/ioremap.c 2018-07-14 16:00:14.071456762 +0100
@@ -103,7 +103,12 @@
(unsigned long long)phys_addr);
WARN_ON_ONCE(1);
return NULL;
- }
+ }
+/* Don't remap the low PCI/ISA area, it's always mapped..
+ */
+ if (is_ISA_range(phys_addr, last_addr))
+ return (__force void __iomem *)phys_to_virt(phys_addr);
+

/*
* Don't allow anybody to remap normal RAM that we're using..

Sorry if this is a bit inexpert. I'm not used to creating patches and I did the actual edit by hand.

I didn't touch anything else in that file. And it built normally with just that edit.
--
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Postin
Frans de Boer
2018-07-19 11:54:19 UTC
Permalink
Post by Michael Shell
On Tue, 17 Jul 2018 14:06:09 +0200
Post by Frans de Boer
Post by Hazel Russman
Gentlemen,
I was given your contact details by Michael Shell, who has been helping me to troubleshoot this problem via the Linux From Scratch support list.
For some time now I have been unable to boot recent kernels (4.14 or later) on my rather elderly desktop machine. The kernel panics during boot and the problem seems (superficially) to lie in the acpi driver. At least that is where the visible error messages come from. Booting with "acpi=off" works but is hardly an ideal solution.
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
If you want me to carry out any further tests, I would be happy to oblige, but do please bear in mind that I am not an expert, so you will need to give fairly basic instructions.
Hazel Russman
Hazel, sorry but where should I remove phys_to_virt()? If I delete the
complete if statement in the iounmap function, and replace that with the
above code, i get compile errors.
btw: acpi=off does not solve the issue too.
Frans.
--
No, it's the other way around. phys_to_virt() doesn't get removed; it gets inserted/reinserted just above the warning not to let normal RAM be remapped. This is code that was in the kernel before but someone took it out and that was what was causing me all that trouble.
--- linux-4.13.0-rc1/arch/x86/mm/ioremap.c 2018-07-14 13:27:21.000000000 +0100
+++ linux-4.13.0-rc1.new/arch/x86/mm/ioremap.c 2018-07-14 16:00:14.071456762 +0100
@@ -103,7 +103,12 @@
(unsigned long long)phys_addr);
WARN_ON_ONCE(1);
return NULL;
- }
+ }
+/* Don't remap the low PCI/ISA area, it's always mapped..
+ */
+ if (is_ISA_range(phys_addr, last_addr))
+ return (__force void __iomem *)phys_to_virt(phys_addr);
+
/*
* Don't allow anybody to remap normal RAM that we're using..
Sorry if this is a bit inexpert. I'm not used to creating patches and I did the actual edit by hand.
I didn't touch anything else in that file. And it built normally with just that edit.
Hello Hazel,

What you inserted is already available as from the 4.13.0 release. But I
can't compile 4.13. anymore because I now have gcc 8.1 instead of the
former 7 series.

I continue my search and go for 4.14 where the check is removed. But i
guess that will fail too and this is no solution to my problem with
systemd freezing just after it found out that it is on a VM.

--- Frans
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mai
Hazel Russman
2018-07-19 12:57:49 UTC
Permalink
On Thu, 19 Jul 2018 13:54:19 +0200
Post by Frans de Boer
Post by Hazel Russman
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
Hello Hazel,
What you inserted is already available as from the 4.13.0 release. But I
can't compile 4.13. anymore because I now have gcc 8.1 instead of the
former 7 series.
I continue my search and go for 4.14 where the check is removed. But i
guess that will fail too and this is no solution to my problem with
systemd freezing just after it found out that it is on a VM.
--- Frans
--
Yes, I can boot 4.13 kernels without any problems. But I wanted an LTS kernel that can keep up with the newest exploits (especially meltdown) and the next LTS after 4.9 is 4.14. I'm using bare iron, not a VM (and no systemd!), but it's rather old hardware. The processor is an Intel Core Duo. I can send you the cpuinfo if you want it.

I suspect that if you did build 4.14, it would behave properly; after all, it does for most people. I have 4.15 on my laptop (which has a Via Nano processor) and no problems there. But I'd be happy to carry out any exploratory tests you like on my desktop, since that's the machine that misbehaves.
--
Hazel
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-m
Frans de Boer
2018-07-19 13:32:46 UTC
Permalink
Post by Hazel Russman
On Thu, 19 Jul 2018 13:54:19 +0200
Post by Frans de Boer
Post by Hazel Russman
[33c2b803edd13487518a2c7d5002d84d7e9c878f] x86/mm: Remove
phys_to_virt() usage in ioremap().
"if (is_ISA_range(phys_addr, last_addr))
return (__force void __iomem *)phys_to_virt(phys_addr);"
makes the system bootable again. I have also tested this on a 4.15 kernel and it works there too.
Hello Hazel,
What you inserted is already available as from the 4.13.0 release. But I
can't compile 4.13. anymore because I now have gcc 8.1 instead of the
former 7 series.
I continue my search and go for 4.14 where the check is removed. But i
guess that will fail too and this is no solution to my problem with
systemd freezing just after it found out that it is on a VM.
--- Frans
--
Yes, I can boot 4.13 kernels without any problems. But I wanted an LTS kernel that can keep up with the newest exploits (especially meltdown) and the next LTS after 4.9 is 4.14. I'm using bare iron, not a VM (and no systemd!), but it's rather old hardware. The processor is an Intel Core Duo. I can send you the cpuinfo if you want it.
I suspect that if you did build 4.14, it would behave properly; after all, it does for most people. I have 4.15 on my laptop (which has a Via Nano processor) and no problems there. But I'd be happy to carry out any exploratory tests you like on my desktop, since that's the machine that misbehaves.
Hello Hazel,

I get the impression you have been send to me with the wrong
info/background. I have no problem running things on bare metal, but it
is the problem with LFS and having systemd on a VM. As explained in the
thread 'Booting LFS with Systemd'.
I know that Bruce uses bare metal too, but why not using VM's when one
can continue developing without having to reboot into an incomplete
system environment. Also, if one has multiple systems to spare, bare
metal can be a way. If not, VM's are a welcome solution.

So, I think that I am chasing the wrong ghost and have a talk with the
systemd developers instead. Despite the lack of interest for using VM's,
I shall share any positive result with the LFS list.

Discussing closed.

Regards Frans.
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_styl
Bruce Dubbs
2018-07-19 16:19:33 UTC
Permalink
Post by Frans de Boer
I know that Bruce uses bare metal too, but why not using VM's when one
can continue developing without having to reboot into an incomplete
system environment. Also, if one has multiple systems to spare, bare
metal can be a way. If not, VM's are a welcome solution.
I generally build on a dedicated development system accessed via ssh.
That accomplishes the same level of convenience as a VM, but I prefer
validating LFS on a real system.

-- Bruce
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_styl
Michael Shell
2018-07-19 19:34:04 UTC
Permalink
On Thu, 19 Jul 2018 13:54:19 +0200
But I can't compile 4.13. anymore because I now have gcc 8.1 instead
of the former 7 series.
Frans,

What goes wrong when you try to build a 4.13 kernel with gcc 8.1?
It should work, right?

Are there any good reasons not to use a gcc 8 series kernel?


Cheers,

Mike
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org
Frans de Boer
2018-07-19 19:54:20 UTC
Permalink
Post by Hazel Russman
On Thu, 19 Jul 2018 13:54:19 +0200
But I can't compile 4.13. anymore because I now have gcc 8.1 instead
of the former 7 series.
Frans,
What goes wrong when you try to build a 4.13 kernel with gcc 8.1?
It should work, right?
Are there any good reasons not to use a gcc 8 series kernel?
Cheers,
Mike
I get an syntax error when compiling pager.c. I had this before and
remembered that gcc 8.1 is less forgiving then the 7 series. So, I tried
to compile the kernel within the LFS development (systemd) environment
which ended with said error.

The next I tried 4.14.0 and all went well. That said, I just go
somewhere else shopping, maybe there is something altered in either
systemd (234-8) or the kernel after 4.13.x. I don't believe that this is
the right thread anymore.

I start with making a VM with a new image of various recent
distributions and see if the same problem occurs there. If not, then it
must be a LFS problem.

-- Frans.
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_styl
Michael Shell
2018-07-20 06:48:33 UTC
Permalink
On Thu, 19 Jul 2018 21:54:20 +0200
Post by Frans de Boer
I get an syntax error when compiling pager.c. I had this before and
remembered that gcc 8.1 is less forgiving then the 7 series.
Frans,

FWIW, there was some discussion about gcc 8.1 issues on the kernel
mailing list:

https://lkml.org/lkml/2018/5/5/181

But, it seems the problems they mention are just warnings, annoying
but they don't actually break anything.

There are some gcc 8.1 kernel patches here:

https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-gcc8

The one of interest are the top two with "Support GCC 8" in the title.

But, I think you'll be more interested in the other set of
GCC 8 patches here:

https://patchwork.openembedded.org/patch/151479/

two files in the kernel source are changed:

tools/lib/str_error_r.c
tools/lib/subcmd/pager.c

So, there is your pager.c problem.

Looking at the patch code, it seems there are/were potential problems
in the kernel code which gcc 8.1 detects and warns (or errors) about.
But, gcc 8 itself isn't the cause of these issues.
Post by Frans de Boer
I don't believe that this is the right thread anymore.
Yeah, but the recent little odds and ends here sure are interesting.


Cheers,

Mike
--
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_
Loading...