修改磁盘IO Sector size

https://downloadcenter.intel.com/download/30065?v=t

ISDCT下载  上面的链接

https://www.intel.com/content/www/us/en/support/articles/000055357/memory-and-storage.html

Simple ISDCT Examples

 

Documentation

Troubleshooting

000055357

03/25/2020

This article is for customers of Intel® data center drives to quickly provide us with SMART logs as well as to update drive firmware. The tool for this is the Intel® SSD Data Center Tool (Intel® SSD DCT).

Customers can download the Intel® SSD Data Center Tool (Intel® SSD DCT).

Customers can also download the complete Intel® SSD Data Center Tool (Intel® SSD DCT) User Guide.

Please refer to specific command line examples below.

Windows*

  1. Open a command prompt window.Right-click Windows Start then click Command Prompt (Admin).

    or

    Click Windows Start, then type cmd and right-click the Command Prompt tile, then click Run as administrator.

  2. Identify Intel® SSDs as below.Run the diskpart utility from C:\ as below.

    diskpart [Enter]. This opens DISKPART utility as below.

    DISKPART> list disk

    Disk 0 Online 298 GB

    Disk 1, Disk 2, and so on.

    Choose the Intel® SSD from the list above. In the example above it is Disk 0 . So index 0 will be used in all the next commands.

    Type Exit to get out of DISKPART utility.

  3. Using ISDCT to display SSD information.C:\isdct>isdct.exe show -intelssd 0

    Example results:

    ProductFamily and SerialNumber of the Intel SSD

    DevicePath:

    DeviceStatus: Healthy

    Firmware:

    FirmwareUpdateAvailable:

    Index:

    ModelNumber:

    ProductFamily

    SerialNumber:

  4. Displaying SMART attributes.For SATA drives:
    C:\isdct>isdct.exe show -smart -intelssd 0

    For PCIe* drives:
    C:\isdct>isdct.exe show –nvmelog smarthealthinfo -intelssd 0

    Copy-paste the screen scroll into a text file and provide this to support agents.

  5. For more severe issues such as SSD or ISDCT not working as expected, customers need to dump out SSD nlog files, and the ISDCT system log, and provide them to support agents.C:\isdct>isdct.exe dump -nlog -intelssd 0 for nlog, and for more detailed ISDCT system logs do the following: C:\isdct>isdct.exe set -system EnableLog=true; ISDCT commands which maybe failing, and when done, locate the nlog binary file and TDKI.log in ISDCT folder. Then contact support and provide agents with these two files.
  6. To update the firmware to the latest version. For updating multiple drives at once, you can write script with the command below in a loop (instead of index=0 in the case below).C:\isdct>isdct.exe load -intelssd 0
  7. More information on latest firmware is available.
    Note SSDs in RAID? Firmware can updated as it is in systems with RST, RSTe and VROC.

    For updating firmware with the below supported hardware RAID cards, first use command: isdct.exe set -system EnableLSIAdapter=’true’

    Supported RAID cards are:

    • MR SAS 9260-16i (RAID card chipset LSI-2108)
    • MR SAS 9270-8i (RAID card chipset LSI-2208)
    • MR SAS 9341-4i (RAID card chipset LSI-3008)
    • MR SAS 9341-8i (RAID card chipset LSI-3008)
    • MR SAS 9361-4i (RAID card chipset LSI-3108)

    For unsupported configurations, customers may use MegaCli/StorCli etc.

Linux*

  1. Open terminal with root permission.sudo su

    Ensure that the command line interface begins with # and not $; as this indicates use has root privileges.

  2. Displaying SMART attributes.For SATA drives:
    isdct show -smart -intelssd 0

    Copy-paste the screen scroll into a text file and provide to support agents or redirect to a file as below:

    isdct show -smart -intelssd 0 > intelssdsmart.txt

    For PCIe* drives:
    isdct show –nvmelog smarthealthinfo -intelssd 0

  3. For more severe issues such as SSD or ISDCT not working as expected, customers need to dump out SSD nlog files, and the ISDCT system log, and provide them to support agents.isdct dump -nlog -intelssd 0 for nlog, and isdct set -system EnableLog=true; isdct set -system LogFile=<path_to_logfile>; Do ISDCT commands which maybe failing. Locate the nlog and ISDCT detailed log, then contact support and provide nlog binary file and the logfile at <path_to_logfile>
  4. To update the firmware to the latest version. For updating multiple drives at once, you can write script with the command below in a loop (instead of index=0 in the case below ).isdct load -intelssd 0
    Note SSDs in RAID? Firmware can updated as it is in systems with RST, RSTe and VROC.

    For updating firmware with the below supported hardware RAID cards, first use command: isdct set -system EnableLSIAdapter=’true’

    Supported RAID cards are:

    • MR SAS 9260-16i (RAID card chipset LSI-2108)
    • MR SAS 9270-8i (RAID card chipset LSI-2208)
    • MR SAS 9341-4i (RAID card chipset LSI-3008)
    • MR SAS 9341-8i (RAID card chipset LSI-3008)
    • MR SAS 9361-4i (RAID card chipset LSI-3108)

    For unsupported configurations, customers may use MegaCli/StorCli etc.

How to Configure Oracle Redo on the Intel PCIe SSD DC P3700

Back in 2011, I made the statement, “I have put my Oracle redo logs or SQL Server transaction log on nothing but SSDs” (Improve Database Performance: Redo and Transaction Logs on Solid State Disks (SSDs). In fact since the release of the Intel® SSD X25-E series in 2008, it is fair to say I have never looked backed. Even though those X25-Es have long since retired, every new product has convinced me further still that from a performance perspective a hard drive configuration just cannot compete. This is not to say that there have not been new skills to learn, such as configuration details explained here (How to Configure Oracle Redo on SSD (Solid State Disks) with ASM). The Intel® SSD 910 series provided a definite step-up from the X25-E for Oracle workloads (Comparing Performance of Oracle  Redo on Solid State Disks (SSDs)) and proved concerns for write peaks was unfounded (Should you put Oracle Database Redo on Solid State Disks (SSDs)). Now with the PCIe*-based Intel® SSD DC P3600/P3700 series Opens in a new windowwe have the next step in the evolutionary development of SSDs for all types of Oracle workloads.

Additionally we have updates in operating system and driver support and therefore a refresh to the previous posts on SSDs for Oracle is warranted to help you get the best out of the Intel SSD DC P3700 series for Oracle redo.

NVMe

One significant difference in the new SSDs is the change in interface and driver from AHCI and SATA to NVMe (Nonvolatile memory express).  For an introduction to NVMe see this video by James Myers and to understand the efficiency that NVMe brings read this post by Christian Black. As James noted, high performance, consistent, low latency Oracle redo logging also needs high endurance, therefore the P3700 is the drive to use. With a new interface comes a new driver, which fortunately is included in the Linux kernel at the Oracle supported Linux releases of Red Hat and Oracle Linux 6.5, 6.6 and 7.

I am using Oracle Linux 7.

Booting my system with both a RAID array of Intel SSD DC S3700 series and Intel SSD DC P3700 series shows two new disk devices:

First the S3700 array using the previous interface

  1. Disk /dev/sdb1: 2394.0 GB, 2393997574144 bytes, 4675776512 sectors
  2. Units = sectors of 1 * 512 = 512 bytes
  3. Sector size (logical/physical): 512 bytes / 4096 bytes
  4. I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Second the new PCIe P3700 using NVMe

  1. Disk /dev/nvme0n1: 800.2 GB, 800166076416 bytes, 1562824368 sectors
  2. Units = sectors of 1 * 512 = 512 bytes
  3. Sector size (logical/physical): 512 bytes / 512 bytes
  4. I/O size (minimum/optimal): 512 bytes / 512 bytes

Changing the Sector Size to 4KB

As Oracle introduced support for 4KB sector sizes at Oracle release 11g R2, it is important to be at a minimum of this release or Oracle 12c to take full advantage of SSD for Oracle redo. However ‘out of the box’ as shown the P3700 presents a 512 byte sector size. We can use this ‘as is’ and set the Oracle parameter ‘disk_sector_size_override’ to true. With this we can then specify the blocksize to be 4KB when creating a redo log file. Oracle will then use 4KB redo log blocks and performance will not be compromised.

As a second option, the P3700 offers a feature called ‘Variable Sector Size’. Because we know we need 4KB sectors, we can set up the P3700 to present a 4KB sector size instead. This can then be used transparently by Oracle without the requirement for additional parameters. It is important to do this before you have configured or started to use the drive for Oracle as the operation is destructive of any existing data on the device.

To do this, first check that everything is up to date by using the Intel Solid State Drive Data Center Tool from https://downloadcenter.intel.com/download/23931/Intel-Solid-State-Drive-Data-Center-ToolOpens in a new window Be aware that after running the command it will be necessary to reboot the system to pick up the new configuration and use the device.

  1. [root@haswex1 ~]# isdct show -intelssd  
  2. – IntelSSD Index 0 –  
  3. Bootloader: 8B1B012D  
  4. DevicePath: /dev/nvme0n1  
  5. DeviceStatus: Healthy  
  6. Firmware: 8DV10130  
  7. FirmwareUpdateAvailable: Firmware is up to date as of this tool release.  
  8. Index: 0  
  9. ProductFamily: Intel SSD DC P3700 Series  
  10. ModelNumber: INTEL SSDPEDMD800G4  
  11. SerialNumber: CVFT421500GT800CGN  

Then run the following command to change the sector size. The parameter LBAFormat=3 sets it to 4KB and LBAFormat=0 sets it back to 512b.

  1. [root@haswex1 ~]# isdct start -intelssd 0 Function=NVMeFormat LBAFormat=3 SecureEraseSetting=2 ProtectionInformation=0 MetaDataSetting=0  
  2. WARNING! You have selected to format the drive!   
  3. Proceed with the format? (Y|N): Y  
  4. Running NVMe Format…  
  5. NVMe Format Successful.  

After it ran I rebooted, the reboot is necessary because of the need to do an NVMe reset on the device because I am on Oracle Linux 7 with a UEK kernel at 3.8.13-35.3.1. At Linux kernels 3.10 and above you can also run the following command with the system online to do the reset.

  1. echo 1 > /sys/class/misc/nvme0/device/reset  

The disk should now present the 4KB sector size we want for Oracle redo.

  1. Disk /dev/nvme0n1: 800.2 GB, 800166076416 bytes, 195353046 sectors  
  2. Units = sectors of 1 * 4096 = 4096 bytes  
  3. Sector size (logical/physical): 4096 bytes / 4096 bytes  
  4. I/O size (minimum/optimal): 4096 bytes / 4096 bytes  

Configuring the P3700 for ASM

For ASM (Automatic Storage Management) we need a disk with a single partition and, after giving the disk a gpt label, I use the following command to create and check the use of an aligned partition.

  1. (parted) mkpart primary 2048s 100%
  2. (parted) print
  3. Model: Unknown (unknown)
  4. Disk /dev/nvme0n1: 195353046s
  5. Sector size (logical/physical): 4096B/4096B
  6. Partition Table: gpt
  7. Disk Flags:
  8. Number  Start  End         Size        File system  Name     Flags
  9. 1      2048s  195352831s  195350784s               primary
  10. (parted) align-check optimal 1
  11. 1 aligned
  12. (parted)

I then use udev to set the device permissions. Note: the scsi_id command can be run independently to find the device id to put in the file and the udevadm command used to apply the rules. Rebooting the system is useful during configuration to ensure that the correct permissions are applied on boot.

  1. [root@haswex1 ~]# cd /etc/udev/rules.d/
  2. [root@haswex1 rules.d]# more 99-oracleasm.rules
  3. KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”3600508e000000000c52195372b1d6008″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″
  4. KERNEL==”nvme0n1p1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”365cd2e4080864356494e000000010000″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

Successfully applied, the oracle user now has ownership of the DC S3700 RAID array device and the P3700 presented by NVMe.

  1. [root@haswex1 rules.d]# ls -l /dev/sdb1
  2. brw-rw—- 1 oracle dba 8, 17 Mar  9 14:47 /dev/sdb1
  3. [root@haswex1 rules.d]# ls -l /dev/nvme0n1p1
  4. brw-rw—- 1 oracle dba 259, 1 Mar  9 14:39 /dev/nvme0n1p1

Use ASMLIB to mark both disks for ASM.

  1. [root@haswex1 rules.d]# oracleasm createdisk VOL2 /dev/nvme0n1p1
  2. Writing disk header: done
  3. Instantiating disk: done
  4. [root@haswex1 rules.d]# oracleasm listdisks
  5. VOL1
  6. VOL2

As the Oracle user, use the ASMCA utilityOpens in a new window to create the ASM disk groups.

fult1.png

I now have 2 disk groups created under ASM.

fult2.png

Because of the way the disk were configured Oracle has automatically detected and applied the sector size of 4KB.

  1. [oracle@haswex1 ~]$ sqlplus sys/oracle as sysasm
  2. SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 12 10:30:04 2015
  3. Copyright (c) 1982, 2014, Oracle.  All rights reserved.
  4. Connected to:
  5. Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production
  6. With the Automatic Storage Management option
  7. SQL> select name, sector_size from v$asm_diskgroup;
  8. NAME                     SECTOR_SIZE
  9. —————————— ———–
  10. REDO                          4096
  11. DATA                          4096

SPFILES in 4K DISKGROUPS

In previous posts I noted Oracle bug “16870214 : DB STARTUP FAILS WITH ORA-17510 IF SPFILE IS IN 4K SECTOR SIZE DISKGROUP” and even with Oracle 12.1.0.2 this bug is still with us.  As both of my diskgroups have a 4KB sector size, this will affect me if I try to create a database in either without having applied patch 16870214.

With this bug, upon creating a database with DBCA you will see the following error.

fult3.png

The database is created and the spfile does exist so can be extracted as follows:

  1. ASMCMD> cd PARAMETERFILE
  2. ASMCMD> ls
  3. spfile.282.873892817
  4. ASMCMD> cp spfile.282.873892817 /home/oracle/testspfile
  5. copying +DATA/TEST/PARAMETERFILE/spfile.282.873892817 -> /home/oracle/testspfile

This spfile is corrupt and attempts to reuse it will result in errors.

  1. ORA-17510: Attempt to do i/o beyond file size
  2. ORA-17512: Block Verification Failed

However, you can extract the parameters by using the strings command and create an external spfile or a spfile in a diskgroup with a 52b sector size. Once complete, the Oracle instance can be started.

  1. SQL> create spfile=’/u01/app/oracle/product/12.1.0/dbhome_1/dbs/spfileTEST.ora’ from pfile=’/home/oracle/testpfile’;
  2. SQL> startup
  3. ORACLE instance started

Creating Redo Logs under ASM

In viewing the same disks within the Oracle instance, the underlying sector size has been passed right through to the database.

  1. SQL> select name, SECTOR_SIZE BLOCK_SIZE from v$asm_diskgroup;
  2. NAME                   BLOCK_SIZE
  3. —————————— ———-
  4. REDO                      4096
  5. DATA                      4096

Now it is possible to create a redo log file with a command such as follows:

  1. SQL> alter database add logfile ‘+REDO’ size 32g;

…and Oracle will create a redo log automatically with an optimal blocksize of 4KB.

  1. SQL> select v$log.group#, member, blocksize from v$log, v$logfile where v$log.group#=3 and v$logfile.group#=3;
  2. GROUP#
  3. ———-
  4. MEMBER
  5. ———–
  6. BLOCKSIZE
  7. ———-
  8.        3
  9. +REDO/HWEXDB1/ONLINELOG/group_3.256.874146809
  10.       4096

Running an OLTP workload with Oracle Redo on Intel® SSD DC P3700 series

To put the Oracle redo on P3700 through its paces I used a HammerDB workload. The redo is set with a standard production type configuration without commit_write and commit_wait parameters.  A test shows we are running almost 100,000 transactions per second at redo over 500MB / second and therefore we would be archiving almost 2 TBs per hour.

Per Second

Per Transaction

Per Exec

Per Call

Redo size (bytes):

504,694,043.7

5,350.6

Log file sync even at this level of throughput is just above 1ms

Event

Waits

Total Wait Time (sec)

Wait Avg(ms)

% DB time

Wait Class

DB CPU 35.4K 59.1
log file sync 19,927,449 23.2K 1.16 38.7 Commit

…and the average log file parallel write showing the average disk response time to just 0.13ms

Event

Waits

%Time -outs

Total Wait Time (s)

Avg wait (ms)

Waits /txn

% bg time

log file parallel write 3,359,023 0 442 0.13 0.12 2237277.09

There are six log writers on this system. As with previous blog posts on SSDs I observed the log activity to be heaviest on the first three and therefore traced the log file parallel write activity on the first one with the following method:

  1. SQL> oradebug setospid 67810;
  2. Oracle pid: 18, Unix process pid: 67810, image: oracle@haswex1.example.com (LG00)
  3. SQL> oradebug event 10046 trace name context forever level 8;
  4. ORA-49100: Failed to process event statement [10046 trace name context forever level 8]
  5. SQL> oradebug event 10046 trace name context forever, level 8;

The trace file shows the following results for log file parallel write latency to the P3700.

Log Writer Worker Over  1ms Over 10ms Over 20ms Max Elapsed
LG00 1.04% 0.01% 0.00% 14.83ms

Looking at a scatter plot of all of the log file parallel write latencies recorded in microseconds on the y axis clearly illustrate that any outliers are statistically insignificant and none exceed 15 milliseconds. Most of the writes are sub-millisecond on a system that is processing many millions of transactions a minute while doing so.

fult4.png

A subset of iostat data shows the the device is also far from full utilization.

  1. avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  2.           77.30    0.00    8.07    0.24    0.00   14.39
  3. Device:         wMB/s avgrq-sz avgqu-sz   await w_await  svctm  %util
  4. nvme0n1        589.59    24.32     1.33    0.03    0.03   0.01  27.47

Conclusion

As a confirmed believer in SSDs, I have long been convinced that most experiences of poor Oracle redo performance on SSDs has been due to an error in configuration such as sector size, block size and/or alignment as opposed to performance of the underlying device itself. In following the configuration steps I have outlined here, the Intel SSD DC P3700 series shows as an ideal candidate to take Oracle redo to the next level of performance without compromising endurance.