MPIO global settings

From Open-E Wiki
Jump to navigation Jump to search

Path grouping policy

In order to understand this option it is required to explain how paths are organized by multipath. When disk is connected using multiple paths and multipath device is created those paths can be organized in groups of paths. Sometimes it might be just one group with all paths but it is not always the case. Multipath can group paths in a few ways and use them to transfer data. At particular point in time data is transferred through one path group that is selected according to its priority. Within selected path group data is transferred in round-robin fashion by default to increase throughput. Multipath has a few policies that are used to create groups of paths.

  1. Multibus(default) - in this policy only one path group is created using all available paths.
  2. Failover - every path is put in separate group.
  3. Group by node name - 1 group is created for each target node name.
  4. Group by priority value - paths with the same priority are added to the same group. Each group has paths with same priority.
  5. Group by serial number - paths that has same serial number are put in one group. We assume that paths can have different serial numbers because it is modified or reported by some controllers or other devices that are used to connect disks. Those devices can report different serial numbers (modified) for actually same disk. This may make sense to put this kind of paths in different group because for example particular path goes physically by different route.

Failback

This setting specifies how multipath handles path group that recovers from failure. According to setting multipath may do nothing or evaluate particular path group priority and switch to group if it has higher priority than currently used group.

  1. Manual(default) - In this setting multipath do not switches back automatically to path group that recovered. However if some path group recovered and all other failed then multipath will switch to this last available path group that previously was unavailable. This setting won’t cause any transfer break.
  2. Immediate - failed path groups are being monitored and as soon as multipath realises that path group recovered it is enabled immediately. But when particular path group recovers it is enabled only if it has higher priority than the one that multipath switched to. If priorities are the same then multipath has no reason to switch path group.
  3. Custom value - number of seconds after path group recovery that have to pass before multipath can switch to it. This setting is similar to immediate but instead of immediately switching path group multipath wait given number of seconds until it is allowed to switch path group. Basically if path group with higher priority than currently used recovers then multipath will switch to it only if it is available for specified time.
  4. Follower - this setting allows automatic failback only if first path in path group becomes active.

Path selector

This option specifies algorithm used to load balance traffic across paths in active path group.

  1. Round-robin(default) - data is split across all paths in active group and same amount of data is send through each path. Multipath simply sends some part of data to first path, next part to second path and so on. Each part of data has same size but it is possible that also weighted version of round robin algorithm is used. In that case paths with higher weight can get more data to transfer.
  2. Queue-length - next piece of data is send through path that has smallest queue of data that is waiting to be send.
  3. Service-time - similar to queue-length because it also sends next piece of data to path that has smallest queue of data waiting to be send. But size of that piece of data that is going to be send is chosen relatively to the speed of particular path.

Path checker

Setting that tells multipath how to check state of path.

  1. Direct I/O - read first sector of disk without using any cache.
  2. Test Unit Ready (default) - Use SCSI command “Test Unit Ready” to check if disk is available. On response to this command device return if it is accessible by client application.
  3. EMC Clariion, RDAC storage controller and HP storage array - this settings are specific for particular hardware. Some hardware vendors provides custom path checker options. Those options can be used with specified hardware.

Path priority routine (prio)

This setting allows to chose program used to obtain priority for path. Path priority is higher when it has higher value. Priorities of paths in path group are summed and group with highest priority is used when currently active group fails.

  1. Const(default) - generate same priority (with value 1) for all paths. Basically this means that path group has higher priority if it has more paths. This setting will also cause that weighted round robin algorithm is never used.
  2. Random - this setting will generate priority randomly in range 1 - 10 and assign it to path.
  3. SCSI-3 ALUA - path priority is generated using SCSI-3 ALUA status. Path priority is generated in following way:
    1. Un-Kh paths are active: both path priorities are set to 50.
    2. One path is active and one is non optimized state: priority of active path is 50 and priority of non optimized path is 10.
    3. One path is active and one is in standby state: priority of active path is 50 and of standby path is 1.
    4. One path is active and one is in unavailable state: priority of active path is 50 and priority of unavailable path is 0.
    5. One path is active and one is in offline state: priority of active path is 50 and priority of offline path is 0.
    6. One path is active and one is in transitioning state: priority of active path is 50 and priority of transitioning path is 0.
  4. Vendor specific settings: EMC arrays, HP storage array, Hitachi HDS Modular storage arrays, NetApp arrays, RDAC storage controller - those settings can be used with specific hardware. In case of those settings multipath communicate with hardware to generate proper path priority and quite possibly paths with faster transfer gets higher priority.

Queue disabling (flush_on_last_del)

This option will disable queueing when last path to device is removed.

Path retry (no_path_retry)

This option specifies what should happen when path fails. It controls if data should be still queued when path is failed or not. It is also possible to specify how many times multipath should reattempt to send data before it fails path.

  1. Disabled (fail)(default) - path is immediately considered as failed and no data is being queued.
  2. Infinite (queue) - data is always queued without failing path.
  3. Custom value - number of attempts that multipath have to do until it fails path.

No. of I/O request (rr_min_io and rr_min_io_rq)>

This section discusses two settings rr_min_io and rr_min_io_rq because those settings are connected with each other. On GUI those settings are called “No. of I/O request for BIO based multipath” (rr_min_io) and “No. of I/O request for request based multipath” (rr_min_io_rq). We can describe rr_min_io as minimum number of I/O that have to be performed before it can switch to next path in same group and this value applies only for block based multipaths. Second setting rr_min_io_rq is a minimum number of requests that have to be routed before it can switch to next path, this setting applies only to request based multipaths. By using those settings it is possible to set minimum amount of data that have to be send through one path before multipath switches to the next one.

Default value for rr_min_io_rq is 1 and for rr_min_io is 1000.

Path weight

This option allows to select method used to assign weight to paths in a group. Setting allows two values:

  1. Uniform(default) - all paths has same weight.
  2. Priorities - weight of each path is calculated by multiplying path priority times rr_min_io_rq (or rr_min_io if it is used but as explained in previous section rather not).


Most probably Path weight is used in weighted round robin algorithm that calculates how much data should be send through particular path in active path group. Path with higher weight is considered to be faster than path with lower weight. Algorithm simply send more data through path with higher weight in order to better balance load. In case of uniform setting use of weighted round robin algorithm is technically disabled because each patch has same weight. In case of priorities weight makes actual impact only if multipath set different priorities to particular paths. Because if priorities have the same value then also weight of each path is different and rr_min_io_rq can be set for whole multipath only but not for single path.