Ticket #115 (closed defect: fixed)

Opened 2 years ago

Last modified 20 months ago

init fails to stop when OpenVZ container is stopped

Reported by: Arjan Schrijver <arjan@anymore.nl> Owned by: roy
Priority: normal Milestone:
Component: rc Version:
Keywords: Cc: pva@gentoo.org

Description (last modified by roy) (diff)

When I stop the container, init stops every process except itself. The init process keeps running after the rest of the processes are terminated.
It keeps running until it gets killed by OpenVZ after 2 minutes.

This is the strace output when I connect it to the init process after all other processes are stopped:

time(NULL) = 1222066161
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0}) = 0 (Timeout)
time(NULL) = 1222066166
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0}) = 0 (Timeout)
time(NULL) = 1222066171
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0}) = 0 (Timeout)

As you can see by the unix timestamps, these blocks of activity happen every 5 seconds. This is the same activity as when init is running normally. It looks to me like init doesn't even know it should terminate.

Change History

comment:1 Changed 23 months ago by arjan@anymore.nl

When 'halt' or 'reboot' is called from inside the container, OpenVZ doesn't know that the container is rebooting or halting, and then the init process is never terminated after stopping all other processes. The container can only be stopped by calling 'vzctl <cntnr> stop' from the OpenVZ master server.

comment:2 Changed 23 months ago by roy

  • Description modified (diff)

So what is OpenRC supposed to do then? I doubt that it could notify the master server from inside the container - can it?

comment:3 Changed 23 months ago by arjan@anymore.nl

It's not really clear what exactly should happen, but a simple (?) fix would be that init kills itself after stopping all other processes. I've contacted the OpenVZ-devel mailinglist about this issue. If they can help me, I will relay the information to you. For the moment I would be very much helped by a patch to help init kill itself when shut down or rebooted in an OpenVZ container. And if you can't help me with that, I would like to ask you to help me write the patch.

comment:4 Changed 22 months ago by pva@gentoo.org

  • Cc pva@gentoo.org added

Well, this does not happens here on all systems I have. What init do you have installed inside container? What openrc version? What openvz kernel version?

comment:5 Changed 22 months ago by arjan@anymore.nl

OpenVZ kernel version is sys-kernel/openvz-sources-2.6.24.005.1
/sbin/init inside the container belongs to sys-apps/sysvinit-2.86-r10
OpenRC inside the container is sys-apps/openrc-0.3.0-r1

All packages are installed from the normal portage tree using emerge.

If necessary I can provide a root login to a hardware node so you can test.

comment:6 Changed 22 months ago by pva@gentoo.org

Well, first try to update your kernel. 2.6.24.005 is quite old and has many different bugs... If the problem persists, I'll try to reproduce problem 2.6.24.006 kernel.

comment:7 Changed 22 months ago by arjan@anymore.nl

Which kernel do you suggest I try? This is the latest kernel available from Portage. I have no problem trying a 2.6.26 openvz development kernel, but I have to keep in mind that this is meant for production servers. I will try the 2.6.24.006.2 kernel now.
Also, with the stable 2.6.18.028.057.2 kernel, I have exactly the same problem.

comment:8 Changed 22 months ago by arjan@anymore.nl

Excuse me, I forgot to sync before I posted my last comment. I see 2.6.18.028.057.2 and 2.6.24.006.2 are in Portage now. I will try them immediately.

comment:9 Changed 22 months ago by pva@gentoo.org

2.6.18.028.057.2 was never stable. For production I suggest to use 2.6.18 stable kernel and in case it does not support hardware you need then move to 2.6.18 unstable (which are based on patchset from RHEL5).
2.6.26 are completely development version of kernel and actually is live kernel based on 2.6.26 branch in git. Use them if you wish to help development/report bugs.
2.6.24 are somewhat in between. They could be used in production and some distributions (e.g. SUSE) use them but history revealed many interesting bugs in them...

Note: that 2.6.24 and 2.6.26 are hardmasked ;)

comment:10 Changed 22 months ago by arjan@anymore.nl

Okay. I now took the most recent stable (2.6.18.028.056.1) and unstable (2.6.24.006.2) versions, but they it doesn't make any difference.

comment:11 Changed 20 months ago by arjan@anymore.nl

Roy, you are great!
OpenRC-0.4.0 finally fixes this issue!

comment:12 Changed 20 months ago by roy

  • Status changed from new to closed
  • Resolution set to fixed

Sweet :)
Probably due to moving the sysvinit shutdown code out of OpenRC.

Note: See TracTickets for help on using tickets.