Thursday, December 5, 2019

Effects of Advance Computer Architecture

Question: Discuss about the Effects of Advance Computer Architecture. Answer: CUDA stands for compute unified device architecture and CUDA cores are platform for parallel computing and interface for application programming (Di Carlo et al., 2013). The model of CUDA was created by NVIDIA. The platform is used by software developers and engineers for enabling graphics processing unit that is helpful for general processing purposes. The CUDA allows direct access for providing the virtual access to GPU and the parallel computational elements. It is useful for executing the compute kernels in the development of processing unit. The platform for CUDA can be accessed with the help of accelerated libraries, industry standard programming (C++, C, and FORTRAN), and compiler directives (as for example OpenACC). CUDA cores are like CPUs SIMD lane and not similar to the CPUs execution core (Hundt, Hildebrandt Schmidt, 2016). The CUDA cores are present in GPU multiprocessors. GPU consists of 10 MPs and it has helped in adjusting thousands of CUDA cores in the structure of GPU. Differences between CUDA and traditional (CPU) cores CPU cores used in traditional processors are responsible for handling all the operations of the CPU and computer users request (Hundt, Hildebrandt Schmidt, 2016). It includes operations like calculation, input/output, interrupts, and calculation. The set of instructions in the CPU cores are very complex and it need optimizing for speeding the fetching the instructions. The CPU cores have faster clock rate and big cache operation. It is important for using more logic commands for implementing any instruction. Hence, they are costly in terms of utilization and function. CUDA cores found mostly in GPUs and they have less cache memory. The utilization of CUDA cores requires simple and straight instructions (Di Carlo et al., 2013). Even the rate clocks of the CUDA cores are less when compared to CPU cores. The primary benefit of using the CUDA cores is that it is optimized for performing group calculation. The use of less cache memory has made CUDA cores less expensive than CPU cores. SMX CUDA core has handling capacity for multiple contexts (such as hyper threading, warps, and SMT). WRF diagram explanation and best system in the diagram The diagram has been shown for the comparison of time elapsed for the processed involved in WRF (weather research and forecasting) model and the processor involved. The size of each processor in the sub domain of the number of processors is fixed at 61X33X51 grid (Burakowski et al., 2016). The included components in the WRF are latency performance ratio and network bandwidth. 1000 processors have to be used for comprising the 92 million grid points of the entire model. The small size step ratio would be helpful for influencing the effectiveness and communication overhead. The three types of scale pointed in the diagram are Pittsburgh Supercomputing Center (PSC)- Terascale Computing System (TCS), Pittsburgh Supercomputing Center (PSC)- Cray XT3, and the Data star at San Diego Supercomputing Center (SDSC).PSC Cray XT3 provides the best value with 2068 computer nodes and each node having 2.6GHz AMD Opteron processors. Effect of pipelining on hardware Pipelining (also known as assembly line operation) is defined as a technique for decomposition or distribution of a sequential process in forms of sub processes (Yildirim et al., 2016). Each of the sub process is completed at a definite and dedicated segment. The effects of pipelining on hardware are- Change in content: The changes in the construction would result in affecting the contents of register hardware. The destined hardware register and some other register would also face changes due to the change in content. The instruction inputted in the pipeline hardware would affect the content of the hardware design. Imbalance of hardware performance: The most primary issue that is faced due to the pipelining activity is the sudden imbalance in the hardware performance (Yildirim et al., 2016). The clock per cycle for pipelining is limited and hence, there is no provision for increasing the efficiency of the hardware performance. The system runs at the efficiency of time required for slowest stage of pipelining. References Burakowski, E. A., Ollinger, S. V., Bonan, G. B., Wake, C. P., Dibb, J. E., Hollinger, D. Y. (2016). Evaluating the climate effects reforestation in New England, USA, using a Weather Research and Forecasting (WRF) Model Multi-Physics Ensemble.Journal of Climate, (2016). Di Carlo, S., Gambardella, G., Indaco, M., Martella, I., Prinetto, P., Rolfo, D., Trotta, P. (2013, May). A software-based self test of CUDA Fermi GPUs. In2013 18th IEEE European Test Symposium (ETS)(pp. 1-6). IEEE. Hundt, C., Hildebrandt, A., Schmidt, B. (2016). rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.BMC bioinformatics,17(1), 394. Xue, M., Droegemeier, K. K., Weber, D. (2007). Numerical prediction of high-impact local weather: A driver for petascale computing.Petascale Computing: Algorithms and Applications,200, 103-124. Yildirim, E., Arslan, E., Kim, J., Kosar, T. (2016). Application-level optimization of big data transfers through pipelining, parallelism and concurrency.IEEE Transactions on Cloud Computing,4(1), 63-75.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.