|
|
Ironic对接原生的Neutron
* o0 {% Y2 j$ c2 K& [; B4 D) [ 部署、配置相关:1 \$ v! ~1 k1 e* O, Z! n
- Ironic自己有一个dhcp-server,在inspect过程使用
- neutron-dhcp,在provision过程使用
- inspect和provision过程使用的tftp server可能不同$ x6 Q# q- n" N# p* C" o n8 F
2 M! @/ h) w# T6 g. k9 Z7 _
Register过程
# A i- a" S$ S0 C用户录入ironic node,包含ipmi等信息
4 V g! W& N7 V+ B2 L0 O2 c 3 g S) e5 J/ I4 Z$ B* N3 n6 o e
Inspect过程/ l+ D1 n$ i- K d
这个过程中使用Inspect Network,要求:
3 C e) ^ w1 k- Ironic dhcp-server能收到BM节点的DHCP请求。
- BM节点拿到IP后,能和tftp-server-1互通(三层可达)) L, U6 @) T7 x) S7 @' M% M1 g
用户获取BM节点信息6 ~+ U: u$ _' V. P. K' z% q1 k) z
- Ironic通过IPMI设置BM节点PXE启动
- Ironic通过IPMI启动BM节点,做PXE启动
- BM节点从Ironic dhcp-server获取IP。此时BM节点的请求报文不带vlan tag,使用上联接入交换机的native vlan(默认tag=1)
- 拿到IP之后,BM节点从tftp-server-1下载小镜像(ramdisk,内含Ironic Python Agent)
- 执行某些操作,获取BM节点的详细信息
- 将BM节点关机。ramdisk运行在内存中,关机后丢失。" M/ R* H4 ]- K3 T! @2 I
/ q, R5 X' B4 [* Q" w) |: u) s/ y
Provision过程
' f# R) q g+ w% w这个过程中使用Provisioning Network(由neutron创建),要求:
. _" m0 }2 R/ e' x8 }- BM, glance-api, ironic-api, ironic-conductor, neutron-dhcp-agent需要保证PROVISION NETWORK连通性" ~3 Z. r) E6 H$ ^2 P
用户申请物理机,安装操作系统,配置业务网卡等5 \* d/ _% O; Q+ ]) M$ b; Z0 u) @
- 从nova入口
- Ironic IPMI启动BM节点,做PXE启动
- 此时,要求BM节点从neutron-dhcp-server获取IP(通过native vlan)。但由于Ironic-dhcp-server也允许native vlan过来的请求,所以必须保证DHCP请求被Neutron-dhcp-server处理。
- 拿到IP之后,BM节点从tftp-server-2(可以和Inspect过程中的tftp server不同)下载小镜像(ramdisk,内含Ironic Python Agent)
- (这一步怎么控制的?)从glance下载用户要求的镜像,做安装(要求拿到的IP和glance-api能互通)
- 安装完成之后,通过cloud-init在BM操作系统内部打上对应的vlan tag(必须保证该vlan tag在接入交换机上预先做了配置). [( O# }7 c: \4 [, m, k2 I8 ]3 y% [
5 D1 J9 D. }% [* a9 C9 t) g
关键问题:7 S. G& T8 \) w, j+ h
- Ironic-dhcp-server和Neutron-dhcp-server都允许native vlan过来的DHCP请求,如果有两个BM节点同时做Inspect和Provision操作,可能引起冲突。
6 D3 A# Y" u4 Q9 a# ~# K# z) C
- 两个DHCP server合并。但是Neutron-dhcp-server是白名单方式,而在Inspect节点,dhcp-server还不知道BM节点的信息,没法配置白名单。
- 严格将Inspect和Provision过程分开。在机房初始化过程中,开启Ironic-dhcp-server,做完Inspect之后将其关闭;或者在EPC上强制Inspect过程中,disable Provision操作。( g5 ~7 e* ^7 S7 H/ P s9 a6 W
# h7 }0 a! g' D; o3 L+ A
* 一级私有云中兴方案,将两个DHCP合并了,运行在ToR交换机上。
4 X3 F% q8 b7 H; Z$ L
- BM节点的租户vlan一定要在接入交换机上预先配置,如果做不到,则需要动态地配置交换机
- Neutron-dhcp-agent需要在业务网上
5 J# u$ N, ?8 x5 T8 |6 ] ' r: }8 C1 V. V7 H& P: N! J- m
苏州Ironic环境
! m& r# C; S) I. l" B( q4 D! K2 ~10.142.24.12 root/@IDC_host43213 a, Y* \6 J) B; e- e3 Q
8 ^( Z9 q4 x$ { y$ L2 G3 e; O
4 m' @/ I7 W& M% D
浙江Ironic测试环境
( I$ n! g; n6 H W- n# A( W1 r8 O T y1 P' I
Ironic DHCP
% I4 ?7 w1 Q. I; R[root@csv-yglcs17 ~]# cat /etc/dhcp/dhc! s$ `6 M9 Z# l/ g) X$ _( e5 @
dhclient.d/ dhcpd6.conf dhcpd.conf' d$ a; U& `! r
[root@csv-yglcs17 ~]# cat /etc/dhcp/dhcpd.conf) y% Z( q k I/ X
option domain-name "test.com";7 _& D: e0 q$ X i0 Y7 |. x. X
option domain-name-servers 8.8.8.8, 61.88.88.88;. x* f O, f; N: x* z0 r. ]" ^
default-lease-time 60000;
) z2 P1 @0 I/ |$ o) I% nmax-lease-time 720000;
% u: X% t- X3 t. [ ?subnet 20.26.34.0 netmask 255.255.255.0 {. k- c+ o3 R( Z( T( ^/ }7 v! m( H
range 20.26.34.10 20.26.34.100; <== DHCP段
* |9 z/ i7 b; y# D/ C% X1 N option routers 20.26.34.1;7 ^" [) f2 {+ j, [" x7 O
next-server 20.26.33.26; <== tftp server
8 I0 Q6 n4 V( j- h' A filename "pxelinux.0";7 u/ w. g/ Q& x3 W- U& m3 u
}
8 Q, }4 l8 P0 {) Esubnet 20.26.33.0 netmask 255.255.255.0 { <== conductor节点只有33.0网段IP,如果不配置这个subnet,则dhcp启动时会报下面这个错误% a Q8 \- F; M9 Y9 _ u& h4 R
}
. A- W" P8 H% m4 n( L& M
8 A: U7 z6 V. |- ?问题:5 u" `( S$ i) t" q
Apr 19 14:30:21 csv-yglcs17 systemd: Starting DHCPv4 Server Daemon...
4 G) R5 P/ n* |% FApr 19 14:30:21 csv-yglcs17 dhcpd: Internet Systems Consortium DHCP Server 4.2.5# a l" u3 P, _7 T
Apr 19 14:30:21 csv-yglcs17 dhcpd: Copyright 2004-2013 Internet Systems Consortium.4 c6 Y4 T7 h8 J: n; C* O* `, ]; K3 X
Apr 19 14:30:21 csv-yglcs17 dhcpd: All rights reserved.8 `3 I; O1 t3 Z# T3 D! G: |. k
Apr 19 14:30:21 csv-yglcs17 dhcpd: For info, please visit https://www.isc.org/software/dhcp/; G5 i1 T W- O; R
Apr 19 14:30:21 csv-yglcs17 dhcpd: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file
. J6 q" g$ A6 M- x6 S/ @Apr 19 14:30:21 csv-yglcs17 dhcpd: Wrote 15 leases to leases file.' L/ b7 Y" V6 X) E8 f* C
Apr 19 14:30:21 csv-yglcs17 dhcpd:4 i% T+ R- ?9 b1 z$ B
Apr 19 14:30:21 csv-yglcs17 dhcpd: No subnet declaration for eno33557248 (no IPv4 addresses).
7 K1 l5 M0 O6 z) T- ]# }Apr 19 14:30:21 csv-yglcs17 dhcpd: ** Ignoring requests on eno33557248. If this is not what% |* s) [; j: y2 B( d4 Z- C2 G2 _3 G% e" P
Apr 19 14:30:21 csv-yglcs17 dhcpd: you want, please write a subnet declaration& K) o O0 D' {" c) B
Apr 19 14:30:21 csv-yglcs17 dhcpd: in your dhcpd.conf file for the network segment
* b8 C9 p5 P @, b( `Apr 19 14:30:21 csv-yglcs17 dhcpd: to which interface eno33557248 is attached. **9 Q5 ?0 B- `2 d; \+ K6 |
Apr 19 14:30:21 csv-yglcs17 dhcpd:
5 e* u5 n8 w; y$ u$ w3 m) j/ |! c& j8 ^Apr 19 14:30:21 csv-yglcs17 dhcpd:
7 X5 r3 _6 ^. H5 |. t; QApr 19 14:30:21 csv-yglcs17 dhcpd: No subnet declaration for virbr0 (192.168.122.1).
3 ^" E0 r. ` Z& W4 g5 XApr 19 14:30:21 csv-yglcs17 dhcpd: ** Ignoring requests on virbr0. If this is not what
: a- C/ j/ _) c8 p" U6 nApr 19 14:30:21 csv-yglcs17 dhcpd: you want, please write a subnet declaration
. D* w" j+ d$ H d* vApr 19 14:30:21 csv-yglcs17 dhcpd: in your dhcpd.conf file for the network segment; f" c; T: g4 d8 A- Z1 S5 k
Apr 19 14:30:21 csv-yglcs17 dhcpd: to which interface virbr0 is attached. **
6 ?8 E0 u( L# J; C: g, N( a2 yApr 19 14:30:21 csv-yglcs17 dhcpd:
! }# v- _/ V5 Z- t EApr 19 14:30:21 csv-yglcs17 dhcpd:
1 K* N: @2 M( c+ X& c( eApr 19 14:30:21 csv-yglcs17 dhcpd: No subnet declaration for eno16777984 (20.26.33.26).5 k" y b3 J& N6 ~
Apr 19 14:30:21 csv-yglcs17 dhcpd: ** Ignoring requests on eno16777984. If this is not what
& P, A- A4 `8 G0 b2 tApr 19 14:30:21 csv-yglcs17 dhcpd: you want, please write a subnet declaration8 f* i& q: O* L* |+ A
Apr 19 14:30:21 csv-yglcs17 dhcpd: in your dhcpd.conf file for the network segment! G$ Z% a. Q0 \" A
Apr 19 14:30:21 csv-yglcs17 dhcpd: to which interface eno16777984 is attached. **' Z% u4 l- w; @7 w
Apr 19 14:30:21 csv-yglcs17 dhcpd:6 ]3 O K, \5 w: A3 @9 L
Apr 19 14:30:21 csv-yglcs17 dhcpd:% K0 c2 Y0 Z! Q! f
Apr 19 14:30:21 csv-yglcs17 dhcpd: Not configured to listen on any interfaces!
9 l# l/ ^/ Z3 P5 {9 `8 VApr 19 14:30:21 csv-yglcs17 dhcpd:1 K2 n" ?6 N* l' h) N
Apr 19 14:30:21 csv-yglcs17 dhcpd: This version of ISC DHCP is based on the release available8 x% I2 u. H# [0 @) Z" b1 O
Apr 19 14:30:21 csv-yglcs17 dhcpd: on ftp.isc.org. Features have been added and other changes, g2 \$ F0 j* M5 \* v% f
Apr 19 14:30:21 csv-yglcs17 dhcpd: have been made to the base software release in order to make; F' d! n& w7 ~& Z# w, M2 P
Apr 19 14:30:21 csv-yglcs17 dhcpd: it work better with this distribution.! ^" {+ K2 Z( o3 ~
Apr 19 14:30:21 csv-yglcs17 dhcpd:8 U& k( ]3 d" t7 T! G
Apr 19 14:30:21 csv-yglcs17 dhcpd: Please report for this software via the CentOS Bugs Database:) h# P B. ?1 c/ Q7 G9 y" j- p# V
Apr 19 14:30:21 csv-yglcs17 dhcpd: http://bugs.centos.org/( G6 ]* B ^; |) W) F& M" {
Apr 19 14:30:21 csv-yglcs17 dhcpd:
, n2 O" t3 ^/ `Apr 19 14:30:21 csv-yglcs17 dhcpd: exiting.' X. H; e* x" S
Apr 19 14:30:21 csv-yglcs17 systemd: dhcpd.service: main process exited, code=exited, status=1/FAILURE' d; K# X1 f( v, L* }. i
Apr 19 14:30:21 csv-yglcs17 systemd: Failed to start DHCPv4 Server Daemon.
( n8 U, J6 l- YApr 19 14:30:21 csv-yglcs17 systemd: Unit dhcpd.service entered failed state.6 e9 j4 i% G$ G- l
Apr 19 14:30:21 csv-yglcs17 systemd: dhcpd.service failed.
; m B) D% q# A5 v* V- I$ K- \3 f4 \, z+ S. `: p- E7 G
) T' z( C# p0 v; q B9 T
Ironic Inspector" H( x0 ^, I7 i, |
[root@csv-yglcs17 pxelinux.cfg]# pwd
" ]$ f6 r7 Z) d" y3 }, K5 V/tftpboot/pxelinux.cfg2 e" y# I' ]$ P5 b1 g& o4 ?1 i
[root@csv-yglcs17 pxelinux.cfg]# cat default
4 B( M8 z/ u+ |; g1 ^default introspect3 I; C+ S) ~" M) \$ m4 d! q
label introspect. {; H; G7 _6 S) @; }1 n9 D- w
kernel /tftpboot/ironic-inspector/inspector-kernel6 G1 `/ j! @$ |: ^/ Y+ m
append initrd=/tftpboot/ironic-inspector/inspector-ramdisk ipa-inspection-callback-url=http://20.26.33.26:5050/v1/continue systemd.journald.forward_to_console=yes ipa-collect-lldp=True+ X% ?, m% i$ k' h9 h" X% k7 R
ipappend 3
1 m w' X6 h! w2 Q. j% f. ^, B1 L
) H' q5 u+ t! [" K0 U; W) w& k# ginspector在20.26.33.26上/ T2 d; s" i W% f6 o
6 {5 g1 }& b0 J1 _- R; N" a6 AIronic Provisioning
6 n' o+ c$ X- E( jironic.conf中的provisioning_network还没配置。还有cleaning_network。
w' B9 m* M# i b. ~. l0 s2 s9 D: Z
8 h5 Y+ M( Y; J6 f' ?# v检查IPMI, x3 o: W* V& s5 |7 V! s
[root@csv-yglcs17 ~]# ipmitool -P Huawei12#$ -H 50.1.65.245 -U root -p 623 -I lanplus power status
/ g; N- Z( Y* P' D0 HChassis Power is on
$ d6 ]2 d* D, ?7 N" n# e3 V' O! W$ ^6 j, j8 @$ u( I: |# W% k
2 M2 z3 D1 `! e4 a+ \9 W' O
w% Y" [( z' e# `: |
== 操作 ==- w9 P9 K8 O ]/ F* b; b
" d0 B1 Z7 P6 ?
: {. u0 l6 P1 e) |7 m) K+ _9 |* ~! Y/ V+ H
& m" L7 k2 Q/ R2 P; J
4 Z r3 ]+ v) G6 e% W+ t: }$ e/ f! K% u+ X, ^' ]9 c1 c

0 i; M- S- S# k# _1 I! R0 V0 |/ g$ x
) |) c: q( |- ^) M$ O) h! r* _. D0 e* Y! V
ironic node-create --chassis_uuid dbb588b3-75e8-4028-b851-110671e05e58 \
# s+ Y7 q6 _8 [2 g9 ?" A' L4 C3 Y( k --driver agent_ipmitool \- i" M% b3 r4 l7 `, z
--name pc-zjnacthd01 \
* h, x4 E# ^; M0 s3 W/ \ -i ipmi_address=50.1.65.245 \/ _$ \" v; X8 m l
-i ipmi_username=root \; ^/ c- t1 d# M
-i ipmi_password=Huawei12#$ \, k" k% y; g4 C- o% U
-i ipmi_port=623 \
8 r0 r( r; K+ y# U1 X: k* J -i driver_info/deploy_kernel=4c1855e5-9b6b-47e2-89e5-3bc351c2ae2e \- `8 Y1 K3 s" o% D/ V
-i driver_info/deploy_ramdisk=2f603c85-de92-44ea-b4d0-1396b91102cc" Q9 f1 b" P7 A
- {& Y9 v/ V" e5 o1 U) Y: _
Update 5/25: 正在开发Ironic AZ功能,通过node-update将AZ属性加入node,同步给nova数据库。nova boot时只需要指定AZ创建机器即可。: t& t2 Z4 w* V8 q7 [" o4 M3 W
5 F! a/ p; j+ A' T# L9 M! A
V& U" v, G, m% }& I1 ~
+ p, _$ H9 b: Y$ _3 y) m' V# I9 Q8 m: m& H. f* @ r6 F$ t* E
; h/ j8 {0 Z5 d* H0 U! H) V# @- A
' A$ `/ K" | v6 C1 @. a
update 5/12:
$ J! a `+ P1 b 8 q. y2 s% N: T% f
0 \- |3 [6 X! W& [
: x, U+ H: Q" J( c3 V/ g
c/ E& I) b$ x, k3 X4 T |
6 H ?4 n, _3 Q# r: c7 k3 H3 @+ [# i' ^' o
inspect成功之后: h3 o) W* ~3 k& \
% @' _3 o9 J7 ]' w
; h6 i# @0 t4 K' m- Y1 m; S! p
2 m) k. {1 Y8 F! L4 P8 s7 D
/ Y! o7 P7 M+ ]7 y + X, |; b" H# p; }
: s: ?, d$ G/ b1 Q5 h. J; g s
' R6 H' s/ w ?6 f. x+ b. M0 [inspect失败,原因见“问题2”( D4 H* }- i& g3 ^# m
: q/ y2 w% f: a% B
8 a" I8 b" o/ }5 ?& E H/ \
, E; B) S# W/ p. _% p# s* S
. t1 h0 b5 F- ~9 Y5 j* l
$ i4 K, D- l4 H6 B7 }* T" D* s; [
/ D8 ^: e- K3 I3 t- `) v! P配置provisioning_network:1 R, i& K: N- I2 W
3 E: U" x [& T
0 u; t1 J2 v" q, z {) q1 ]7 T6 }6 Z; B/ ?
L- R d- R% }, v
8 x' m' x- b+ Y1 l, I* m0 h/ ~ # o- g4 Y2 P* ]9 @) E; X
# d7 T# |6 b8 x! w/ X9 A- ~
0 ]3 y$ K6 o9 B! q+ ^% M B
" \7 f/ ~1 j' ]1 X. V: u
9 X6 F" h/ A" O: ?$ X- B* B1 hInspect成功之后:# h) _) n- r% g! V% s' k

1 b+ i5 n" O: a' k7 N8 b+ }
" A! U: V- N6 U9 C! T2 C, @0 s6 J( D2 R5 s
- C3 s2 U {( X3 @: ?6 q" y$ A' ^9 ~, {+ U- a4 i. {

4 u U' { R2 a& L" d) Z# {% T) ^ H0 I
+ E2 x& I& g3 z" T
{# i) I' j+ h
) i1 [. ]/ P- c. [* D! Q9 ~. ^: @上传Ironic使用的镜像:
5 R% `8 g# f9 J9 c# Dglance image-create --name CentOS-7-64bit-ironic.qcow2 --disk-format qcow2 --container-format bare --file CentOS-7-64bit-ironic.qcow2 --is-public True --human-readable —progress* x6 O# O1 r# k) ~9 k- Q
glance image-update 40928b81-9be1-402a-8684-4e2d2fcf330f --property hypervisor_type=baremetal; A: x* h! E" J+ h$ z* J
1 ~, p0 M6 l, E' O, Y8 W3 j
nova boot --flavor 2 --image 40928b81-9be1-402a-8684-4e2d2fcf330f --nic net-id=3a151049-ff3f-4bc5-88a1-b9084ec24bc9 pc-zjnacthd01
/ E9 U& P5 @1 f; b
$ C a6 F6 @- V( ^2 s5 w# Q' B$ N$ C( N; j
$ F% z4 V7 `! N. f3 g5 l
5 R( g; W1 \/ ~& V' ?0 Z
9 C9 e' h( v$ d8 B9 @== 问题 ==
& V% M" ?: F p1 h, [- node name有限制?
n. ?* N: j, U T' ?! s

: a# n2 |: b/ ^9 X5 L; d( b, R5 e) P8 b$ l
4 }/ L7 h) j Z& H% L" V# m) C/ k' X2 H: D6 F7 b3 v
4 _7 T1 h9 _0 I, z2 z N& l
- 第一次Inspect失败
2 I( l" I2 R5 v4 v9 l, B 2017-04-20 15:29:16.409 28596 ERROR ironic_inspector.main File "/usr/lib/python2.7/site-packages/keystoneauth1/access/service_catalog.py", line 228, in url_for
: e* h$ x4 t* k+ r2017-04-20 15:29:16.409 28596 ERROR ironic_inspector.main raise exceptions.EndpointNotFound(msg)! B3 P, `; |$ I1 ~) k
2017-04-20 15:29:16.409 28596 ERROR ironic_inspector.main EndpointNotFound: public endpoint for baremetal service in RegionFour region not found* T- M4 v" m$ W3 y4 a
+ r) H- m N) c重启ironic服务后解决0 e4 f9 T9 v8 P7 v& k. Y4 O4 {# Z# B
5 d8 W( C' a! X9 i0 x- 第二次inspect失败,BM拿不到IP5 Y. ~# W9 @; I; k( _( [$ P
DHCP请求已经发送到dhcp server:
7 A, x0 a T) J8 S# B* I$ m % V* U3 G) }- `/ r) I
3 Q' ^) F3 a$ H5 H* j" L O
6 |8 M" e$ W( c5 H* q" y! {
6 Q4 L, ~( U3 B( s: N5 V+ |- j
2 j. D4 F6 a9 ^1 t- [- inspect时找不到cleaning_network
5 F4 k( [+ R0 z/ U4 A 配置cleaning_network(=provide_network)! Y, L1 R$ P# _
- m8 }6 _8 d! W& I' f- nova boot失败, conductor.log:! C" z# v! {2 {3 ^9 ~8 {8 X
- B: d: J: I s) u
9 a: ?3 V$ \) @
3 X8 k l# O$ F( }7 a' Y
: } u$ p! X: Z b* l1 n/ l, j更新控制节点的nova代码、ironic节点的ironic代码、计算节点ironicclient代码之后,问题解决. I1 S- ?. @4 T/ x9 e
) I/ S7 \5 q& ?) C( T7 o. \- nova boot失败,compute.log% W3 _$ M i$ s

+ B7 `0 ?% k& b/ O/ i- ~2 a
: m% b; q# O8 [3 i7 e' W0 Q
0 u' x7 G- ]) x5 T; y" `& q. L
/ ^, w* Z( M$ X: k; s! r原因是这个ironic node driver_info还没更新:
+ o: A+ J' Z2 z- Z: {- f; \ 6 Z5 ? i" y$ R
7 `) j5 {& q: X6 D* T- l( \: n- }* j; q7 M2 k( o2 q# A( y# I9 k% m
( k5 A8 N; q% _) w; h更新一下:# i( @ ]% |+ @* V3 s
ironic node-update baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 add driver_info/deploy_kernel=f8205536-070b-4286-8d0c-35e3b8647741$ n/ T% B' ?- _0 t, m: [# f) F/ X
ironic node-update baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 add driver_info/deploy_ramdisk=302e6438-4d31-429b-8bae-47e225d4ed67
( S" _) C+ {% a# J4 E: D$ Yupdate 05/12:
2 ]+ b" Y/ ^% ~! ~; H. D: Jironic node-update baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 add driver_info/deploy_kernel=4c1855e5-9b6b-47e2-89e5-3bc351c2ae2e; \1 L* O4 V5 r% \% l' n
ironic node-update baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 add driver_info/deploy_ramdisk=2f603c85-de92-44ea-b4d0-1396b91102cc# w t K& O1 y# Q; f/ I4 k3 H
' R* q; q7 ~1 R% s- ?" B) y) f" ^
: S, C: I: q' R6 {- |8 ~; g# c( t ^6 x0 a P4 I( y, {
4 T4 x) ^& Z( s$ i9 ^
1 O% Z0 K4 U$ D2 u8 @- h- nova boot失败,镜像找不到,compute.log
# n4 Q; }: l6 t% K# ]& r. G( `2 f$ ^
: l: t; {9 T# p% c( U) l: ?( Q, L
* v0 {' m# m2 E# {/ V; t x
8 e1 \6 p4 q5 F' ^$ w- o9 h+ {8 L6 m8 y/ |4 d" L
计算节点nova.conf的glance-api配错了:
1 Z8 ^5 u" K9 I3 U0 j
2 ]1 D4 ^4 n; Y6 v& L
" t8 m% ]* d) r9 [1 I5 |! U+ d$ a( J' B3 C4 s5 r# C' g
% M# s& Q, f3 x% Aironic-conductor节点ironic.conf中添加glance api version=1: |) B" r, P% @( n& {. C$ D" b

% F6 g$ d. M/ E3 P* s- v2 o* |" o0 q* |' L
1 [& c9 X' z" k' W- l8 s
3 c& t w' K4 F9 q, _6 {+ s$ ~
! g- b& N0 q. g. w# G3 A' M
$ Q2 I- [& p5 {& @' G0 V! P$ K L2 Wglance_api_version=19 n8 {8 f# z5 K: S5 t! ~/ K
/ P7 D- g1 z" o1 q/ ?0 k- nova boot失败,ironic-conductor.log:! ^ c8 v [; J2 _9 x0 \3 p$ D

; K' y6 s h5 J& Y" D: ?4 B' i4 I- z) [
' e0 y5 p$ ?( \9 S5 R7 I
) G% r. } b3 z* u命令行验证,可以在provisioning network d5a284c3-41d3-4eb3-a11f-58a99d3e2eb1上创建port
, [6 T. J( e! ~3 @& P- ?1 G) S! G. L
原因是没有enable LLDP。enable之后:
; y' ?+ x. P# v# J. _6 S: h. k( o( g9 V4 H7 ~. i
ironic port-list | awk '{print $2}' | egrep '[0-9]+' | xargs -I 'X' ironic port-delete 'X'* h e. t b9 ], y/ b0 w
ironic portgroup-list | awk '{print $2}' | egrep '[0-9]+' | xargs -I 'X' ironic portgroup-delete ‘X'
: q0 Q' e+ B, i! K/ C重新Inspect:
0 L9 `7 m+ ?* e6 D- K* c `( d4 Oironic node-set-provision-state baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 manage T+ u- Q$ g1 k" x2 R
ironic node-set-provision-state baa519fc-7c06-40f8-8e5a-5fd3b6e97e01 inspect2 d$ t: d( X2 c$ z, y2 n- q
5 r' d+ x9 g6 ?3 b) H1 x3 }
. t/ R5 D, P5 d5 z* ]
: P+ u0 {' G9 l+ ?$ E
6 k) ]( u N- J- _
' S. c; M+ G7 f# q% f! r - v. f' F5 q% S5 x/ o; X# ^3 J
/ R ]7 X: {8 A- b) z1 j7 h( w
! b- S! T/ `$ q3 T
* h& K7 E6 y* [1 j- d& O q+ k7 L, n* I% v, C
- nova boot失败,找不到用户镜像: G! e4 ~4 T6 ?. u, v
原因是glance-registry.conf中的数据库写错了。
7 \6 W: r: L; Y: h+ M) L2 m$ x; E, r0 a5 v3 B* j
- nova boot失败,找不到ramdisk% m' f: s0 n3 z D
Q" m, @, a' ], p9 e % ~5 g: q2 l/ c5 a
3 z' n9 l: S/ f6 K! W: F
3 I) B$ p* @6 E' Q5 G' v+ u7 `( s
4 G: a8 i/ K7 W& U
这个image UUID是配置在ironic node的driver_info里面的,image需要上传到glance
# o* D+ T( {: T4 t [
4 W! b# m+ x: T, \5 E% J上传镜像:
. K- T* [: X9 k2 d* l+ R ' Q$ b8 z. s$ g$ B0 ~' w
" f) S& X( O/ S6 L2 _! _8 L! S5 k( q
" E* h0 i6 s3 q, o
( R9 s! R, ^1 i% X: o3 `3 N/ ~5 x
) ~- `% e3 g- d1 X2 k# N! |8 S- [! h6 H1 }) W. ?4 ~
- h4 \, g A' @4 [+ \( x! X6 |+ O) W
2 x( E8 G, ^! z4 @; ~
5 L. ~' q7 W# c# O: \
" k3 v/ I& B( l# D. i
u: R F1 a- _2 g, T& n5 t3 e6 A6 u; D9 X1 N5 G0 x* w2 I
7 p. F+ R% f) ^9 |更新Ironic node信息:; s0 S, c: U. _; K- Z- l

, J% g4 X2 x, k, M& e
1 D6 E! Q/ A* G4 @
) Z: P7 b0 }( V6 W0 g! Q
- k+ J S6 v5 {; A
2 _3 E. Y2 e; b2 O( ?6 S8 N- nova boot失败,访问tftp权限不够: v' a1 _( L% ?3 a
) r( c+ [* r, v m: \0 i
3 p; k! p, _' z% I
' L4 z$ P7 k7 P4 i" ^- P
) P' W( Y$ Y8 Q/ A
8 A5 p; E# W5 y% X( `5 M% m
chown -R ironic:ironic /tftpboot/9 w( Z: l/ I6 n$ P" {7 y
9 H* f0 x; ]; D# G+ q9 W/ T

: h( O( r5 a" @$ t5 l% @1 G
1 y( y1 \, Y0 d5 ^! b& Z5 r8 c3 G9 K O4 }" v
1 h4 C4 V/ `2 b$ O" l/ q1 c* b b7 B3 @, d7 z
- nova boot失败,物理机DHCP请求被ironic-dhcp捕获了! K* t1 i9 b: \/ l" ^3 B }0 G
关闭ironic-dhcp
! S$ U7 p' j& W
2 _+ J8 A; s+ \7 L- nova boot失败,物理机DHCP时不能从neutron DHCP拿到IP
7 Z5 d+ B( i2 K 在控制节点上,neutron dhcp在dnsmasq启动的namespace中。relay的目的地址是控制节点管理网IP(eno16777984),dnsmasq的监听设备为namespace的tap口,IP为20.26.34.91,他拿不到dhcp请求。, [' J, ?/ d [& i. b7 P& `
现在的方法是:在控制节点上手动启动一个dnsmasq,使用neutron dhcp一样的配置
+ G; g" i7 W0 D% U3 \6 _
9 X3 \% t& k' z6 s2 w6 M# G" ^- 拿到IP之后,进入ramdisk系统,但是重启之后不能进入用户镜像的操作系统3 [; y8 k: G/ q) [5 W& |* U* A; @
查看BIOS的启动设备顺序,发现是- Boot Device Selector : No override% u( R! x& M; R4 X
查看ironic-conductor.log,发现连不上20.26.34.70:9999。这是IPA的地址和监听端口,需要保证ironic-conductor节点能连上,但是的确不通。. f0 |, q8 q; n7 @

* n' Z% |0 X/ i% H$ a* [+ N! r7 G
- G. C$ s& v/ @; C9 w7 X( [7 u" z$ i$ u) C5 G7 N
- I: E. B3 R: `# j8 T) @
" L: R5 H4 U t& w- `" }+ _$ s* U
姚军说可能是ramdisk启动之后,有两个网口获取到了IP地址,引起路由错乱,建议我们ramdisk启动之后,删除第二个地址。. M9 ~8 O% M$ g8 A( L0 K
( l- O( h1 i! `6 I8 j t2 L05/04 update: 在provisioning network上加上静态路由:destination=控制节点网段,nexthop为provisioning network GW* b; c' D# {3 n: h. P0 ~
05/11 update:neutron subnet-update aca03dd8-3d2a-4c54-99de-7a8a7bac4f53 --host-route destination=20.26.33.0/24,nexthop=20.26.34.1
: d0 j: i6 U1 OUpdated subnet: aca03dd8-3d2a-4c54-99de-7a8a7bac4f53
3 I8 m0 y! f- I% s) G5 L
: Z1 Z: ^% Z/ g6 S: I6 {9 _' I+ G( p8 s
% S. l! {' T! G% j" U2 O5 B {( G
0 C" t+ L; z# e! h9 U/ ?$ g9 ^) [3 K1 V
验证可行,能连接这个端口并下载用户镜像: ==> 为啥会有多个网卡获取到IP,如何从代码层面解决?
2 p% M3 T( y% s
5 S8 |. H& j. h' ^! E, m, X5 [- E$ t
3 ~' z( k5 Z" C% K: D& W: R5 \( Y* k
, I/ ~% k: W3 u( d" ^7 [' G3 P" |( v: a
( m8 s' W$ ~# \5 I6 M: C
. n8 m# Y6 U+ |: v
IPMI查询启动顺序:ipmitool -P Huawei12#$ -H 50.1.65.245 -U root -p 623 -I lanplus chassis bootparam get 5/ z9 v+ h2 G1 n: S- M1 g
设置硬盘启动:ipmitool -P Huawei12#$ -H 50.1.65.245 -U root -p 623 -I lanplus chassis bootdev disk9 S2 h; B) r6 [: H x% e+ G# _

' ]/ D9 u, H% N* s% E' D8 Y7 s5 ]4 }" g! K( F F
a7 ~) @1 E* S* V9 M9 q
# f; J" u0 F$ |2 X; z/ z
9 Z8 a5 q7 h7 t8 |6 I3 V- 用户镜像下载到了/dev/sdl,没有下载到第一个硬盘,并且整个boot过程超时了
% e& x; a3 \$ ^. b; x
2 R+ d+ [9 G4 A0 t0 [1 M3 K9 Y8 b
0 X* M" M. }) M1 y
/ d4 u6 I* U5 M
: W4 E% ?% Q0 p. Z" K; Q, v& C
a. 姚军修改了ramdisk,固定使用/dev/sda作为写入的硬盘: r8 o% p! e: O* }# g, [
b. 修改ironic.conf的deploy_callback_timeout=900
5 C1 t2 h0 F0 n& a! R7 c! a
+ _+ v- B( g1 q: m/ V2 @Updat 05/04: 0 l" A1 w+ G5 d! D6 o4 N- h
李灏:ironic node-update 4fae2ae3-0935-4585-8be2-00298015f8f3 replace properties/root_device='{"name": "/dev/sda"}'
3 \- O) @% p" K+ s8 M! A6 I# x' X/ B, F
- 写入了/dev/sda,但是ironic-conductor没有重启机器,导致boot hang死& l& A) ^6 R* d. n
journalctl -fu python-ironic-agent查看IPA内的日志
$ o z; Z' }8 J, M, ~& ?! @. }) w7 ojournalctl --no-pager6 ]0 X, s( m+ j* o8 ^
. C1 u! P( e. r
- 镜像写入/dev/sda后,IPA执行partprobe /dev/sda失败
. S: }4 i2 h- N0 m! T

8 n9 `$ K- o: k( _" i; C
5 n% k/ O" [- N
6 U( b r0 a, l5 }0 Y8 k u
t3 Y: E7 [9 v" t( o" Framdisk中的ironic-lib需要打patch:https://review.openstack.org/#/c/444061/
4 h6 d7 X: e8 Y0 b4 e/ p! Q3 y0 A6 {1 J: y( z8 q
1 G& J: ]) [# h4 b2 ~
- , H8 O0 l9 J _' w2 _( _) H g) Q$ k
|
|