3

I read some answers related to this problems. Will the OS crash if the system partition can't be access for a short period? But I cannot solve this problem.

When using ISCSI as Storage Repository at XenServer and DomU (VM) is in heavy disk I/O, If ISCSI connection lost ( mainly network connection problem/ storage failover), DomU filesystem ( specially ext3 linux filesystem ) crashed. In this case, ext3 filesystem of DomU becomes read-only or unrecoverable.

How can I protect the filesystem of VM in case ISCSI connection lost at Dom0 ?

This is my XenServer environment.

[root@cnode01-m ~]# iscsiadm -m session tcp: [1] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:c5544ae6-9715-6f38-f83b-a446896ac614 tcp: [3569] 10.32.1.240:3260,2 iqn.1986-03.com.sun:02:5c41ce31-3fbb-c6aa-d479-947e85515ac7 [root@cnode01-m ~]# vgs VG #PV #LV #SN Attr VSize VFree VG_XenStorage-1aeee13b-2a87-1d0d-1834-7b8c868009b0 1 40 0 wz--n- 6.35T 4.93T VG_XenStorage-28e2c663-dae5-9504-9733-e05063ff081d 1 57 0 wz--n- 6.35T 4.52T VG_XenStorage-365d6e13-5caa-1fea-9940-e1bb553e3513 1 42 0 wz--n- 6.35T 5.13T VG_XenStorage-4ea23f9a-f945-5d45-cbd2-f3eab3fe75b3 1 42 0 wz--n- 6.35T 5.40T VG_XenStorage-54d69165-2eed-c058-d587-1b84d488adea 1 37 0 wz--n- 6.35T 5.01T VG_XenStorage-598b7237-282b-ea61-8edc-5101a70ea001 1 63 0 wz--n- 6.35T 5.01T VG_XenStorage-6a063762-26de-a3f8-f18c-734fce25433a 1 49 0 wz--n- 6.35T 5.56T VG_XenStorage-6b7bea84-7269-fa88-7b95-23dce431e1aa 1 71 0 wz--n- 6.35T 4.80T VG_XenStorage-6d6d263b-243c-fb24-4f0c-28b226a22bab 1 47 0 wz--n- 6.35T 4.94T VG_XenStorage-76fe6d6d-a37a-698d-9af2-50ea3f55e127 1 44 0 wz--n- 6.35T 5.37T VG_XenStorage-80e2df33-268c-b8a6-cc02-71f27ebe3326 1 39 0 wz--n- 6.35T 5.80T VG_XenStorage-886070b7-34e8-eb96-0931-2c31952608a6 1 13 0 wz--n- 457.65G 369.31G VG_XenStorage-97136f70-cf33-2593-38e0-b8c09785a754 1 60 0 wz--n- 6.35T 5.14T VG_XenStorage-c910e9fd-8817-0b99-8c8d-1ee0883705de 1 37 0 wz--n- 6.35T 5.67T VG_XenStorage-cd709bcb-d46a-8483-acbf-49b2b0c59c06 1 58 0 wz--n- 6.35T 4.80T VG_XenStorage-e153d09a-716a-9764-8967-f704278d55bd 1 43 0 wz--n- 6.35T 4.45T VG_XenStorage-f8574b51-31d4-7b0e-c71e-8253e1cdd230 1 61 0 wz--n- 6.35T 4.20T [root@cnode01-m ~]# ls -la /dev/sd[a-z] brw-r----- 1 root disk 8, 0 Jun 8 17:37 /dev/sda brw-r----- 1 root disk 8, 16 Aug 1 10:14 /dev/sdb brw-r----- 1 root disk 8, 32 Jun 8 17:38 /dev/sdc brw-r----- 1 root disk 8, 48 Jul 31 14:49 /dev/sdd brw-r----- 1 root disk 8, 64 Jul 31 14:46 /dev/sde brw-r----- 1 root disk 8, 80 Jul 31 14:51 /dev/sdf brw-r----- 1 root disk 8, 96 Aug 3 13:52 /dev/sdg brw-r----- 1 root disk 8, 112 Aug 3 10:53 /dev/sdh brw-r----- 1 root disk 8, 128 Aug 2 13:40 /dev/sdi brw-r----- 1 root disk 8, 144 Jul 30 00:17 /dev/sdj brw-r----- 1 root disk 8, 160 Jul 30 00:17 /dev/sdk brw-r----- 1 root disk 8, 176 Jul 30 00:17 /dev/sdl brw-r----- 1 root disk 8, 192 Jul 30 00:17 /dev/sdm brw-r----- 1 root disk 8, 208 Jul 30 00:17 /dev/sdn brw-r----- 1 root disk 8, 224 Jul 30 00:17 /dev/sdo brw-r----- 1 root disk 8, 240 Jul 30 00:17 /dev/sdp brw-r----- 1 root disk 65, 0 Jul 30 00:17 /dev/sdq 

This is my DomU (VM) enviroment.

[root@i-58-7172-VM ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16G 1.5G 14G 11% / /dev/xvda1 99M 30M 65M 32% /boot tmpfs 512M 0 512M 0% /dev/shm 

When I put heavy I/O load to / partition at VM and ISCSI connection have some problems (network problem, ISCSI target failover event) / partition crashed.

How can I solve this problem ? In advance, Thank you so much.

Added

This is my iscsid.conf at Dom0

 [root@cnode01-m ~]# more /etc/iscsi/iscsid.conf node.startup = manual node.session.timeo.replacement_timeout = 86400 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.session.iscsi.FastAbort = No 

10G ethernet and Jumbo Frame are implemented at storage layer. And Citrix XenServer have also command for pausing VMs when storage service have some issue But Pausing and Unpausing VM operation cause unintegrity of the VM's system clock. So it may have side effect, nomally at application layer. I think.

1 Answer 1

1

first you should address the source of the issue - storage access. with iscsi you can tweak iscsi.conf, and increase the queue length, buffer sizes and timeout, so the connection will be able to sustain longer outages. besides, implementing multipathing, 10G ethernet (if the SAN supports it) and jumbo frames is a good idea.

I'm no Xen expert, but with KVM, there is an option to pause the VMs when there is an EIO or ENOSPACE returned by the storage layer, it should be possible with Xen, if you dig into the options IMO, and if not - I'd try and file a feature request with the developers.

3
  • Thank you so much for your relay. I updated my question. Commented Aug 4, 2011 at 0:26
  • What you call multipath is XenServer --> Storage Node == multi path. right ? Commented Aug 4, 2011 at 2:10
  • could be - I'm no Xen expert. So, to start, you could look for the best values for your particular iSCSI target. As for pausing, while timing might be an issue, corrupted filesystem is much more of an issue, IMO Commented Aug 4, 2011 at 12:42

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.