KVM虚拟化之小型虚拟机kvmtool的使用
创始人
2024-05-18 21:52:54
0

根据 kvmtool github仓库文档的描述,类似于QEMU,kvmtool是一个承载KVM Guest OS的 host os用户态虚拟机,作为一个纯的完全虚拟化的工具,它不需要修改guest os即可运行, 不过,由于KVM基于CPU的硬件虚拟化支持,它只支持使用相同架构的Guest. 

kvmtool提供了一个干净的、从头开始写的、轻量级虚拟化工具,  代码量只有5KLOC,由于轻量,对于想要学习虚拟化的人来说,非常的友好。kvmtool 作为KVM主机工具实现,可以引导Linux客户映像,无需BIOS和其他相关依赖. 下面我们尝试基于ubuntu22环境下搭建一个kvmtool运行环境,在虚拟机上运行另一个linux系统。

主机环境

本实验使用的主机系统是ubuntu22.04,具体信息参考下图:

下载代码

下载kvmtool:

$ git clone https://github.com/kvmtool/kvmtool.git

下载busybox:

$ wget https://busybox.net/downloads/busybox-1.32.0.tar.bz2

下载Linux内核:

$ axel -a -n 80 https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.15.18.tar.gz

选择版本的时候,刻意选择工具和源码版本大体同一段时间的即可,无需太多关注。

编译kvmtool

 本次实验使用的kvmtool版本为:e17d182ad3f797f01947fc234d95c96c050c534b,编译方式简单直接,进入 kvmtool目录下直接make 即可:

编译后的可执行程序为lkvm,同时建立了一个lkvm的硬连接vm.两者完全一致。

编译Linux内核

内核的编译方法很简单,参考博客

https://blog.csdn.net/tugouxp/article/details/117616804?spm=1001.2014.3001.5502

这里需要注意三点:

  1. 修掉.pem文件缺失相关的编译错误,有两个

  1. 只需要编译bzImage目标,不需要编译模块

  1. 默认menuconfig即可,已经打开了KVM,VIRTIO相关选项

最后生成bzImage文件:

编译busybox

基于busybox制作根文件系统,构建目录结构,参考博客:

https://blog.csdn.net/tugouxp/article/details/124434243

需要注意的是,执行完博客中的操作后,需要将顶层目录的linuxrc文件重命名为init.

之后将rootfs目录压缩为cpio文件。

$ find . | cpio -o --format=newc > root_fs.cpio

完成后目录结构如下:

以上三步操作完成后,就可以开始运行了。

运行虚拟机

执行前,确认主机存在/dev/kvm设备节点

运行虚拟机执行如下命令

$ sudo ./lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -i ../busybox-1.32.0/_install/root_fs.cpio
zlcao@zlcao-RedmiBook-14:~/kvm/kvmtool$ sudo ./lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -i ../busybox-1.32.0/_install/root_fs.cpio # lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -m 704 -c 8 --name guest-100110
[    0.000000] Linux version 5.15.18 (zlcao@zlcao-RedmiBook-14) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP Fri Jan 27 12:27:51 CST 2023
[    0.000000] Command line: noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 earlyprintk=serial i8042.noaux=1 console=ttyS0 root=/dev/vda rw 
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai  
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[    0.000000] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[    0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[    0.000000] signal: max sigframe size: 2032
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000ffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000002bffffff] usable
[    0.000000] printk: bootconsole [earlyser0] enabled
[    0.000000] ERROR: earlyprintk= earlyser already used
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI not present or invalid.
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 11c01001, primary cpu clock
[    0.000004] kvm-clock: using sched offset of 198180346 cycles
[    0.000522] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.002007] tsc: Detected 1992.002 MHz processor
[    0.002444] last_pfn = 0x2c000 max_arch_pfn = 0x400000000
[    0.002986] Disabled
[    0.003182] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.003765] CPU MTRRs all blank - virtualized system.
[    0.004236] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
Memory KASLR using RDRAND RDTSC...
[    0.005590] found SMP MP-table at [mem 0x000f03b0-0x000f03bf]
[    0.006456] Using GB pages for direct mapping
[    0.007160] RAMDISK: [mem 0x2bd00000-0x2bf83fff]
[    0.007640] ACPI: Early table checksum verification disabled
[    0.008311] ACPI BIOS Error (bug): A valid RSDP was not found (20210730/tbxfroot-210)
[    0.009234] No NUMA configuration found
[    0.009526] Faking a node at [mem 0x0000000000000000-0x000000002bffffff]
[    0.010001] NODE_DATA(0) allocated [mem 0x2bfd6000-0x2bffffff]
[    0.010937] Zone ranges:
[    0.011122]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.011581]   DMA32    [mem 0x0000000001000000-0x000000002bffffff]
[    0.012074]   Normal   empty
[    0.012351]   Device   empty
[    0.012626] Movable zone start for each node
[    0.012971] Early memory node ranges
[    0.013292]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.013732]   node   0: [mem 0x0000000000100000-0x000000002bffffff]
[    0.014192] Initmem setup node 0 [mem 0x0000000000001000-0x000000002bffffff]
[    0.014710] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.014878] On node 0, zone DMA: 97 pages in unavailable ranges
[    0.022910] On node 0, zone DMA32: 16384 pages in unavailable ranges
[    0.023633] Intel MultiProcessor Specification v1.4
[    0.024453] MPTABLE: OEM ID: KVMCPU00
[    0.024719] MPTABLE: Product ID: 0.1         
[    0.025000] MPTABLE: APIC at: 0xFEE00000
[    0.025279] Processor #0 (Bootup-CPU)
[    0.025527] Processor #1
[    0.025698] Processor #2
[    0.025861] Processor #3
[    0.026025] Processor #4
[    0.026191] Processor #5
[    0.026356] Processor #6
[    0.026521] Processor #7
[    0.026715] IOAPIC[0]: apic_id 9, version 17, address 0xfec00000, GSI 0-23
[    0.027163] Processors: 8
[    0.027344] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.027735] kvm-guest: KVM setup pv remote TLB flush
[    0.028059] kvm-guest: setup PV sched yield
[    0.028372] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.028859] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.029349] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.029843] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fefff]
[    0.030330] PM: hibernation: Registered nosave memory: [mem 0x000ff000-0x000fffff]
[    0.030820] [mem 0x2c000000-0xffffffff] available for PCI devices
[    0.031217] Booting paravirtualized kernel on KVM
[    0.031546] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.032234] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
[    0.034042] percpu: Embedded 61 pages/cpu s212992 r8192 d28672 u262144
[    0.034524] kvm-guest: setup async PF for cpu 0
[    0.034866] kvm-guest: stealtime: cpu 0, msr 2ae33080
[    0.035203] kvm-guest: PV spinlocks enabled
[    0.035483] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.035994] Built 1 zonelists, mobility grouping on.  Total pages: 177152
[    0.036454] Policy zone: DMA32
[    0.036658] Kernel command line: noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 earlyprintk=serial i8042.noaux=1 console=ttyS0 root=/dev/vda rw 
[    0.037994] Unknown kernel command line parameters "noacpi", will be passed to user space.
[    0.039146] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.039968] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.040621] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.045493] Memory: 657968K/720504K available (16393K kernel code, 4387K rwdata, 10492K rodata, 2932K init, 4816K bss, 62276K reserved, 0K cma-reserved)
[    0.046448] random: get_random_u64 called from __kmem_cache_create+0x2f/0x520 with crng_init=0
[    0.046702] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.047702] ftrace: allocating 47928 entries in 188 pages
[    0.064484] ftrace: allocated 188 pages with 5 groups
[    0.065149] rcu: Hierarchical RCU implementation.
[    0.065448] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=8.
[    0.065873]     Rude variant of Tasks RCU enabled.
[    0.066157]     Tracing variant of Tasks RCU enabled.
[    0.066456] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.066930] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.070850] NR_IRQS: 524544, nr_irqs: 488, preallocated irqs: 16
[    0.071549] random: crng done (trusting CPU's manufacturer)
[    0.071979] Console: colour *CGA 80x25
[    0.072283] printk: console [ttyS0] enabled
[    0.072283] printk: console [ttyS0] enabled
[    0.072969] printk: bootconsole [earlyser0] disabled
[    0.072969] printk: bootconsole [earlyser0] disabled
[    0.073921] APIC: Switch to symmetric I/O mode setup
[    0.074351] Not enabling interrupt remapping due to skipped IO-APIC setup
[    0.075319] kvm-guest: setup PV IPIs
[    0.075970] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x396d566cf43, max_idle_ns: 881590760263 ns
[    0.076947] Calibrating delay loop (skipped) preset value.. 3984.00 BogoMIPS (lpj=7968008)
[    0.077665] pid_max: default: 32768 minimum: 301
[    0.081003] LSM: Security Framework initializing
[    0.081417] landlock: Up and running.
[    0.081733] Yama: becoming mindful.
[    0.082087] AppArmor: AppArmor initialized
[    0.082481] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.083131] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Poking KASLR using RDRAND RDTSC...
[    0.085044] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.085971] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.086434] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.086991] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.088961] Spectre V2 : Mitigation: Full generic retpoline
[    0.089424] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.090121] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    0.090713] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.091429] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl
[    0.092018] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    0.092950] SRBDS: Unknown: Dependent on hypervisor status
[    0.093435] MDS: Mitigation: Clear CPU buffers
[    0.101335] Freeing SMP alternatives memory: 40K
[    0.318017] smpboot: CPU0: Intel 06/8e (family: 0x6, model: 0x8e, stepping: 0xb)
[    0.319105] Performance Events: Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    0.321782] ... version:                2
[    0.322127] ... bit width:              48
[    0.322481] ... generic registers:      4
[    0.322819] ... value mask:             0000ffffffffffff
[    0.323267] ... max period:             00007fffffffffff
[    0.323719] ... fixed-purpose events:   3
[    0.324941] ... event mask:             000000070000000f
[    0.325598] rcu: Hierarchical SRCU implementation.
[    0.327121] smp: Bringing up secondary CPUs ...
[    0.327742] x86: Booting SMP configuration:
[    0.328094] .... node  #0, CPUs:      #1
[    0.009568] kvm-clock: cpu 1, msr 11c01041, secondary cpu clock
[    0.329211] kvm-guest: setup async PF for cpu 1
[    0.329667] kvm-guest: stealtime: cpu 1, msr 2ae73080
[    0.330021]  #2
[    0.009568] kvm-clock: cpu 2, msr 11c01081, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 APIC: 7
[    0.331227] kvm-guest: setup async PF for cpu 2
[    0.331227] kvm-guest: stealtime: cpu 2, msr 2aeb3080
[    0.333172]  #3
[    0.009568] kvm-clock: cpu 3, msr 11c010c1, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 APIC: 7
[    0.334905] kvm-guest: setup async PF for cpu 3
[    0.334905] kvm-guest: stealtime: cpu 3, msr 2aef3080
[    0.334905] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.337190]  #4
[    0.009568] kvm-clock: cpu 4, msr 11c01101, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU4: APIC id mismatch. Firmware: 4 APIC: 1
[    0.339458] kvm-guest: setup async PF for cpu 4
[    0.339458] kvm-guest: stealtime: cpu 4, msr 2af33080
[    0.341165]  #5
[    0.009568] kvm-clock: cpu 5, msr 11c01141, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU5: APIC id mismatch. Firmware: 5 APIC: 0
[    0.343159] kvm-guest: setup async PF for cpu 5
[    0.343159] kvm-guest: stealtime: cpu 5, msr 2af73080
[    0.345078]  #6
[    0.009568] kvm-clock: cpu 6, msr 11c01181, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU6: APIC id mismatch. Firmware: 6 APIC: 7
[    0.346579] kvm-guest: setup async PF for cpu 6
[    0.346579] kvm-guest: stealtime: cpu 6, msr 2afb3080
[    0.346579]  #7
[    0.009568] kvm-clock: cpu 7, msr 11c011c1, secondary cpu clock
[    0.009568] [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 7 APIC: 6
[    0.349375] kvm-guest: setup async PF for cpu 7
[    0.349375] kvm-guest: stealtime: cpu 7, msr 2aff3080
[    0.349687] smp: Brought up 1 node, 8 CPUs
[    0.349687] smpboot: Max logical packages: 1
[    0.349897] smpboot: Total of 8 processors activated (31872.03 BogoMIPS)
[    0.353085] devtmpfs: initialized
[    0.353355] x86/mm: Memory block size: 128MB
[    0.354192] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.354192] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
[    0.354845] pinctrl core: initialized pinctrl subsystem
[    0.357228] PM: RTC time: 05:49:54, date: 2023-01-27
[    0.358851] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.359701] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.361013] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.361921] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.362746] audit: initializing netlink subsys (disabled)
[    0.363325] audit: type=2000 audit(1674798594.637:1): state=initialized audit_enabled=0 res=1
[    0.363325] thermal_sys: Registered thermal governor 'fair_share'
[    0.363325] thermal_sys: Registered thermal governor 'bang_bang'
[    0.363325] thermal_sys: Registered thermal governor 'step_wise'
[    0.364971] thermal_sys: Registered thermal governor 'user_space'
[    0.365682] thermal_sys: Registered thermal governor 'power_allocator'
[    0.366337] EISA bus registered
[    0.367271] cpuidle: using governor ladder
[    0.367616] cpuidle: using governor menu
[    0.369047] PCI: Using configuration type 1 for base access
[    0.371011] Kprobes globally optimized
[    0.371378] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.371378] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.373053] ACPI: Interpreter disabled.
[    0.373350] iommu: Default domain type: Translated 
[    0.373350] iommu: DMA domain TLB invalidation policy: lazy mode 
[    0.376980] vgaarb: loaded
[    0.377344] SCSI subsystem initialized
[    0.377600] usbcore: registered new interface driver usbfs
[    0.377600] usbcore: registered new interface driver hub
[    0.377741] usbcore: registered new device driver usb
[    0.378091] pps_core: LinuxPPS API ver. 1 registered
[    0.378415] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti 
[    0.379028] PTP clock support registered
[    0.379316] EDAC MC: Ver: 3.0.0
[    0.381070] NetLabel: Initializing
[    0.381298] NetLabel:  domain hash size = 128
[    0.381582] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.381967] NetLabel:  unlabeled traffic allowed by default
[    0.382356] PCI: Probing PCI hardware
[    0.382356] PCI host bridge to bus 0000:00
[    0.382356] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.382356] pci_bus 0000:00: root bus resource [mem 0x00000000-0x7fffffffff]
[    0.382388] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
[    0.383074] pci 0000:00:00.0: [1af4:1041] type 00 class 0x020000
[    0.384986] pci 0000:00:00.0: reg 0x10: [io  0x6200-0x62ff]
[    0.385384] pci 0000:00:00.0: reg 0x14: [mem 0xd2000000-0xd20000ff]
[    0.385820] pci 0000:00:00.0: reg 0x18: [mem 0xd2000400-0xd20007ff]
[    0.394166] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
[    0.394690] clocksource: Switched to clocksource kvm-clock
[    0.407575] VFS: Disk quotas dquot_6.6.0
[    0.407909] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.408491] AppArmor: AppArmor Filesystem Enabled
[    0.408831] pnp: PnP ACPI: disabled
[    0.410916] NET: Registered PF_INET protocol family
[    0.411442] IP idents hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.412701] tcp_listen_portaddr_hash hash table entries: 512 (order: 1, 8192 bytes, linear)
[    0.413465] TCP established hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.414181] TCP bind hash table entries: 8192 (order: 5, 131072 bytes, linear)
[    0.414825] TCP: Hash tables configured (established 8192 bind 8192)
[    0.415558] MPTCP token hash table entries: 1024 (order: 2, 24576 bytes, linear)
[    0.416173] UDP hash table entries: 512 (order: 2, 16384 bytes, linear)
[    0.416776] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear)
[    0.417414] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.417903] NET: Registered PF_XDP protocol family
[    0.418322] pci_bus 0000:00: resource 4 [io  0x0000-0xffff]
[    0.418794] pci_bus 0000:00: resource 5 [mem 0x00000000-0x7fffffffff]
[    0.419406] PCI: CLS 0 bytes, default 64
[    0.419810] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x396d566cf43, max_idle_ns: 881590760263 ns
[    0.419933] Trying to unpack rootfs image as initramfs...
[    0.421289] clocksource: Switched to clocksource tsc
[    0.421757] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.423248] Initialise system trusted keyrings
[    0.423671] Key type blacklist registered
[    0.424313] workingset: timestamp_bits=36 max_order=18 bucket_order=0
[    0.426758] zbud: loaded
[    0.427453] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.428192] fuse: init (API version 7.34)
[    0.428890] integrity: Platform Keyring initialized
[    0.430492] Freeing initrd memory: 2576K
[    0.435013] Key type asymmetric registered
[    0.435289] Asymmetric key parser 'x509' registered
[    0.435621] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[    0.436190] io scheduler mq-deadline registered
[    0.436884] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    0.438064] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    0.459372] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a U6_16550A
[    0.480907] serial8250: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a U6_16550A
[    0.502620] serial8250: ttyS2 at I/O 0x3e8 (irq = 4, base_baud = 115200) is a U6_16550A
[    0.505001] Linux agpgart interface v0.103
[    0.508374] loop: module loaded
[    0.509013] tun: Universal TUN/TAP device driver, 1.6
[    0.509497] PPP generic driver version 2.4.2
[    0.509993] VFIO - User Level meta-driver version: 0.3
[    0.510593] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.511181] ehci-pci: EHCI PCI platform driver
[    0.511689] ehci-platform: EHCI generic platform driver
[    0.512245] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.512857] ohci-pci: OHCI PCI platform driver
[    0.513377] ohci-platform: OHCI generic platform driver
[    0.513947] uhci_hcd: USB Universal Host Controller Interface driver
[    0.514683] i8042: PNP detection disabled
[    0.515301] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.516030] mousedev: PS/2 mouse device common for all mice
[    0.516659] input: AT Raw Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.517669] rtc_cmos rtc_cmos: only 24-hr supported
[    0.518179] i2c_dev: i2c /dev entries driver
[    0.518713] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    0.520248] device-mapper: uevent: version 1.0.3
[    0.521055] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com
[    0.522019] platform eisa.0: Probing EISA bus 0
[    0.522587] eisa 00:00: EISA: Mainboard @@@0000 detected
[    0.523175] eisa 00:01: EISA: slot 1: @@@0000 detected (disabled)
[    0.523834] eisa 00:02: EISA: slot 2: @@@0000 detected (disabled)
[    0.524514] eisa 00:03: EISA: slot 3: @@@0000 detected (disabled)
[    0.525209] eisa 00:04: EISA: slot 4: @@@0000 detected (disabled)
[    0.525882] eisa 00:05: EISA: slot 5: @@@0000 detected (disabled)
[    0.526553] eisa 00:06: EISA: slot 6: @@@0000 detected (disabled)
[    0.527228] eisa 00:07: EISA: slot 7: @@@0000 detected (disabled)
[    0.527902] eisa 00:08: EISA: slot 8: @@@0000 detected (disabled)
[    0.528538] platform eisa.0: EISA: Detected 8 cards
[    0.529059] intel_pstate: CPU model not supported
[    0.529779] ledtrig-cpu: registered to indicate activity on CPUs
[    0.530507] intel_pmc_core intel_pmc_core.0:  initialized
[    0.531105] drop_monitor: Initializing network drop monitor service
[    0.531923] NET: Registered PF_INET6 protocol family
[    0.535712] Segment Routing with IPv6
[    0.536127] In-situ OAM (IOAM) with IPv6
[    0.536550] NET: Registered PF_PACKET protocol family
[    0.537127] Key type dns_resolver registered
[    0.538631] IPI shorthand broadcast: enabled
[    0.539012] sched_clock: Marking stable (532988952, 5568579)->(558785633, -20228102)
[    0.540219] registered taskstats version 1
[    0.540813] Loading compiled-in X.509 certificates
[    0.542057] Loaded X.509 cert 'Build time autogenerated kernel key: 25cc8cb7907826729975261abe82eb726e9a7e0c'
[    0.544528] zswap: loaded using pool lzo/zbud
[    0.545873] Key type ._fscrypt registered
[    0.546290] Key type .fscrypt registered
[    0.546703] Key type fscrypt-provisioning registered
[    0.549064] Key type encrypted registered
[    0.549676] AppArmor: AppArmor sha1 policy hashing enabled
[    0.550410] ima: No TPM chip found, activating TPM-bypass!
[    0.550989] Loading compiled-in module X.509 certificates
[    0.551999] Loaded X.509 cert 'Build time autogenerated kernel key: 25cc8cb7907826729975261abe82eb726e9a7e0c'
[    0.552950] ima: Allocated hash algorithm: sha1
[    0.553710] ima: No architecture policies found
[    0.554232] evm: Initialising EVM extended attributes:
[    0.554765] evm: security.selinux
[    0.555142] evm: security.SMACK64
[    0.555494] evm: security.SMACK64EXEC
[    0.555881] evm: security.SMACK64TRANSMUTE
[    0.556308] evm: security.SMACK64MMAP
[    0.556692] evm: security.apparmor
[    0.557060] evm: security.ima
[    0.557367] evm: security.capability
[    0.557750] evm: HMAC attrs: 0x1
[    0.558401] PM:   Magic number: 7:314:821
[    0.558969] RAS: Correctable Errors collector initialized.
[    0.561277] Freeing unused decrypted memory: 2036K
[    0.562714] Freeing unused kernel image (initmem) memory: 2932K
[    0.589485] Write protecting the kernel read-only data: 30720k
[    0.597456] Freeing unused kernel image (text/rodata gap) memory: 2036K
[    0.604078] Freeing unused kernel image (rodata/data gap) memory: 1796K
[    0.659967] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    0.660538] Run /init as init processPlease press Enter to activate this console. 
/ # 

虚拟机中执行top

多核虚拟化

测试平台有8个核,代码中默认是按照实际核数给的VCPU设定,所以上图我们可以看到有8个CPU在活跃。

从代码中可以看到,每个VCPU对应HOST进程上的一个线程,我们可以随便指定任意多的VCPU,通过--cpus选项:

$ sudo ./lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -i ../busybox-1.32.0/_install/root_fs.cpio --cpus=32 --name zilong

代码分析

执行lkvm的后续参数表示将要执行的二级函数入口,比如,虚拟机运行时执行的是lkvm run命令,则对应的入口函数为kvm_cmd_run:

kvm_cmd_run调用kvm_cmd_run_work继续进行虚拟机的Launch,针对每个VCPU,创建一个pthread运行GUEST OS。

设置CPUID

虚拟机在运行过程当中,执行cpuid获取CPU号时会退出虚拟机,进入HOST进行模拟:

由于每个VCPU和HOST虚拟机进程的一个线程绑定,所以虚拟机初始化时需要每个VCPU线程将自己所代表的CPUID号写入HOST KVM驱动中,用于Guest OS在从NON-ROOT模式退出到ROOT模式后,在HOST KVM Driver  中实现对CPUID的模拟。所以,接下来每个VCPU线程会有设置CPUID的动作。

依次执行:kvm_cpu_thread->kvm_cpu__start->kvm_cpu__reset_vcpu->kvm_cpu__setup_cpuid.

IO虚拟化

内核KVM模块提供了一种机制,可以将一片区域注册为IOTRAP,当guest os 访问这篇区域的时候,将会触发其退出NON-ROOT模式进入HOST,借助这种机制实现对IO的虚拟化。核心函数为:

当发生陷入,GUEST OS退出到HOST:

依次执行:

kvm_cpu__emulate_io->kvm__emulate_io->mmio->mmio_fn(vcpu, port, data, size, is_write, mmio->ptr);

最终执行kvm_register_iotrap注册的回调函数mmio_fn实现差异化IO设置,这一点和QEMU TCG有点像,只是TCG用手工翻译插入helper实现陷入,而KVM依赖硬件支持的陷入,回调流程非常相似。

至此测试完成,后面在逐步解剖KVMTOOL的代码实现,对虚拟化的实现原理加深认知。

注意事项

  1. 主机系统需要支持CPU虚拟化硬件加速,对于INTEL的处理器,需要支持VT-X,对于AMD处理器,需要支持AMD-V,如果不支持,执行时将会报告如下错误,lscpu 后才知道,原来运行平台是一台VMWare虚拟机。

~/Workspace$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           4
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz
Stepping:            7
CPU MHz:             2793.437
BogoMIPS:            5586.87
Hypervisor vendor:   VMware
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            22528K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xsaves arat pku ospke md_clear flush_l1d arch_capabilities
~/Workspace$
~/Workspace$ ls -l /dev/kvm
ls: cannot access '/dev/kvm': No such file or directory
~/Workspace$
  1. 目标OS 架构和HOST OS架构必须相同,虽然事先知道KVM只支持相同ISA的具有虚拟化功能CPU,但实验开始时没有留意这一点,使用上面博客中ARM版本镜像zImage和文件系统去启动,结果执行后卡死,后面联想到QEMU,才恍然大悟。

  1. 也是由于第二点的操作才知道的一个细节,bzImage文件只有x86才有,ARM架构下虽然也支持 make bzImage编译指令,但是编译出来的实际上是zImage,没有bzImage.

总结

KVM Hypervisor属于II型虚拟机,自然基于KVM实现的QEMU-KVM和kvmtool都属于II型虚拟机的实现, kvmtool和QEMU非常类似,整体架构如下图所示:

作为一个轻量的KVM虚拟机实现,后面可以研究一下代码,看KVMTOOL是如何从头开始启动一个kernel的,深入了解虚拟化原理,之后在学习其它模块,比如virtio以及IO虚拟化的时候,会非常有帮助。

参考资料

https://blog.csdn.net/Linux_Everything/article/details/117538064

https://zhuanlan.zhihu.com/p/545241171

https://zhuanlan.zhihu.com/p/583203148

https://blog.csdn.net/qq_41146650/article/details/124595502

结束

相关内容

热门资讯

喜欢穿一身黑的男生性格(喜欢穿... 今天百科达人给各位分享喜欢穿一身黑的男生性格的知识,其中也会对喜欢穿一身黑衣服的男人人好相处吗进行解...
发春是什么意思(思春和发春是什... 本篇文章极速百科给大家谈谈发春是什么意思,以及思春和发春是什么意思对应的知识点,希望对各位有所帮助,...
网络用语zl是什么意思(zl是... 今天给各位分享网络用语zl是什么意思的知识,其中也会对zl是啥意思是什么网络用语进行解释,如果能碰巧...
为什么酷狗音乐自己唱的歌不能下... 本篇文章极速百科小编给大家谈谈为什么酷狗音乐自己唱的歌不能下载到本地?,以及为什么酷狗下载的歌曲不是...
华为下载未安装的文件去哪找(华... 今天百科达人给各位分享华为下载未安装的文件去哪找的知识,其中也会对华为下载未安装的文件去哪找到进行解...
怎么往应用助手里添加应用(应用... 今天百科达人给各位分享怎么往应用助手里添加应用的知识,其中也会对应用助手怎么添加微信进行解释,如果能...
家里可以做假山养金鱼吗(假山能... 今天百科达人给各位分享家里可以做假山养金鱼吗的知识,其中也会对假山能放鱼缸里吗进行解释,如果能碰巧解...
四分五裂是什么生肖什么动物(四... 本篇文章极速百科小编给大家谈谈四分五裂是什么生肖什么动物,以及四分五裂打一生肖是什么对应的知识点,希...
一帆风顺二龙腾飞三阳开泰祝福语... 本篇文章极速百科给大家谈谈一帆风顺二龙腾飞三阳开泰祝福语,以及一帆风顺二龙腾飞三阳开泰祝福语结婚对应...
美团联名卡审核成功待激活(美团... 今天百科达人给各位分享美团联名卡审核成功待激活的知识,其中也会对美团联名卡审核未通过进行解释,如果能...