首页 \ 问答 \ 按逻辑顺序排序(sql order by logic)

按逻辑顺序排序(sql order by logic)

 我有一个包含两列的表。 其中一列包含文本，另一列包含整数值。  
 我需要这个表按整数值排序（更高的值到顶部），但如果整数值等于0，那么我需要按字母顺序排序该行。 让我们说我有这张桌子  
TextCol|IntCol|
-------|------|
Delta  |  0   |
Alpha  |  0   |
Beta   |  3   |
Sierra |  2   |
Gama   |  1   |
 
 现在我需要这个：  
TextCol|IntCol|
-------|------|
Beta   |  3   |
Sierra |  2   |
Gama   |  1   |
Alpha  |  0   |
Delta  |  0   |
 
 这个SQL查询是什么？ 

I have a table that contains two columns. One of the columns contain text and the other contain integer values. 
I need this table to be ordered by the integer value (higher values to the top) but if the integer value equals to 0 then I need that row to be ordered alphabetically aswell. Lets say that I have this table 
TextCol|IntCol|
-------|------|
Delta  |  0   |
Alpha  |  0   |
Beta   |  3   |
Sierra |  2   |
Gama   |  1   |
 
Now I need this : 
TextCol|IntCol|
-------|------|
Beta   |  3   |
Sierra |  2   |
Gama   |  1   |
Alpha  |  0   |
Delta  |  0   |
 
What would be the SQL query for this?

原文：https://stackoverflow.com/questions/39079181

更新时间：2022-12-20 06:12

最满意答案

 有一个CUDA工具可以为您获得良好的网格和块大小： Cuda Occupancy API 。  
 响应“我选择更大的块大小，此代码执行得越快” - 不一定，因为您需要提供最大占用率的大小（活动warp与可能的活动warp总数的比率）。  
 有关其他信息，请参阅此答案如何为CUDA内核选择网格和块尺寸？ 。  
 最后，对于支持Kelper或更高版本的Nvidia GPU，有一些改进的内在函数可以使缩减更容易，更快。 这是一篇关于如何使用shuffle内在函数的文章： 在开普勒上更快的并行缩减 。  
 选择线程数的更新：  
 如果导致寄存器使用效率降低，则可能不希望使用最大线程数。 从占用的链接：  
 为了计算占用率，每个线程使用的寄存器数量是关键因素之一。 例如，具有计算能力1.1的设备每个多处理器具有8,192个32位寄存器，并且最多可以驻留768个并发线程（每个warp 24个warp x 32个线程）。 这意味着在其中一个设备中，为了使多处理器具有100％的占用率，每个线程最多可以使用10个寄存器。 然而，这种确定寄存器计数如何影响占用的方法没有考虑寄存器分配粒度。 例如，在计算能力1.1的设备上，每个线程使用12个寄存器的128个线程块的内核导致每个多处理器有5个活动128线程块的占用率为83％，而具有256个线程块的内核每个线程使用相同的12个寄存器导致占用率为66％，因为只有两个256线程块可以驻留在多处理器上。  
 因此我理解它的方式是，由于可以分配寄存器的方式，增加的线程数可能会限制性能。 但是，情况并非总是如此，您需要自己进行计算（如上所述），以确定每个块的最佳线程数。 

There is a CUDA tool to get good grid and block sizes for you : Cuda Occupancy API. 
In response to "The bigger I choose blockSize, the faster this code will execute" -- Not necessarily, as you want the sizes which give max occupancy (the ratio of active warps to the total number of possible active warps).  
See this answer for additional information How do I choose grid and block dimensions for CUDA kernels?. 
Lastly, for Nvidia GPUs supporting Kelper or later, there are shuffle intrinsics to make reductions easier and faster. Here is an article on how to use the shuffle intrinsics : Faster Parallel Reductions on Kepler. 
Update for choosing number of threads: 
You might not want to use the maximum number of threads if it results in a less efficient use of the registers. From the link on occupancy : 
For purposes of calculating occupancy, the number of registers used by each thread is one of the key factors. For example, devices with compute capability 1.1 have 8,192 32-bit registers per multiprocessor and can have a maximum of 768 simultaneous threads resident (24 warps x 32 threads per warp). This means that in one of these devices, for a multiprocessor to have 100% occupancy, each thread can use at most 10 registers. However, this approach of determining how register count affects occupancy does not take into account the register allocation granularity. For example, on a device of compute capability 1.1, a kernel with 128-thread blocks using 12 registers per thread results in an occupancy of 83% with 5 active 128-thread blocks per multi-processor, whereas a kernel with 256-thread blocks using the same 12 registers per thread results in an occupancy of 66% because only two 256-thread blocks can reside on a multiprocessor. 
So the way I understand it is that an increased number of threads has the potential to limit performance because of the way the registers can be allocated. However, this is not always the case, and you need to do the calculation (as in the above statement) yourself to determine the optimal number of threads per block.

按逻辑顺序排序(sql order by logic)

最满意答案

相关问答

减少奇数元素CUDA(Reduction of odd number of elements CUDA)[2023-09-27]

CUDA减少，大阵列的方法(CUDA reduction, approach for big arrays)[2023-10-13]

OpenMP到CUDA：减少(OpenMP to CUDA: Reduction)[2022-02-01]

CUDA减少最小值和指数(CUDA Reduction minimum value and index)[2023-09-06]

CUDA减少了许多小型，不规则大小的阵列(CUDA reduction of many small, unequally sized arrays)[2022-10-19]

使用CUDA减少矩阵列(Reduce matrix columns with CUDA)[2023-11-17]

OpenCL / CUDA：两阶段缩减算法(OpenCL/CUDA: Two-stage reduction Algorithm)[2023-05-22]

CUDA使用double2阵列减少推力(CUDA Thrust reduction with double2 arrays)[2023-09-23]

CUDA Thrust的分步缩减(Strided reduction by CUDA Thrust)[2024-02-06]

GPU减少数十亿元素阵列(GPU reduction for billion-element array)[2022-03-26]

相关文章

最新问答